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More than hot air 


US President Barack Obama gave a fine speech on global warming, but now he must deliver 


on regulations for coal power and greater fuel economy. 


years since US President Barack Obama last waded into the 

complex arena of energy and climate change. His emphasis then 
was on an ‘all-of-the-above’ approach that put oil and natural gas on 
an even keel with alternative energy sources. 

But on 25 June, citing “the overwhelming judgement of science’, as 
well as the country’s founding fathers, who charged political leaders 
“to make decisions with an eye on a longer horizon than the arc of our 
own political careers’, Obama broke a long silence on global warming. 

The centrepiece of the president’s speech was a pledge to regulate 
carbon emissions from power plants new and old. The power sector 
produces some 40% of total US emissions, and administration officials 
have long said that they would fill the regulatory void if Congress failed 
to act. Although Obama did not make any specific promises last week, 
he did lay out a schedule and put the full weight of the White House 
behind these efforts, which is what they need and deserve. 

These commitments are overdue. The US Environmental Protec- 
tion Agency (EPA) has already proposed a regulation that would 
essentially ban the construction of new power plants unless they are 
equipped to capture and sequester carbon. That rule has languished 
for over a year, and under the new schedule will not be finished for 
almost another 12 months. Many of Obama's most ardent supporters, 
as well as his critics, had long assumed that the EPA was already work- 
ing on regulations for existing power plants. Apparently it wasn’t — at 
least, not in any serious way. Obama has now ordered the agency to 
issue a regulatory proposal next June and to finalize the rules a year 
after that, just in time for a major United Nations climate summit 
in Paris. 

Obama's ‘climate action pla’ contained a variety of other initia- 
tives, including calls for a new round of appliance standards, fuel- 
economy regulations on heavy-duty vehicles and various efforts 
intended to prepare the country for a warmer climate. Much of the 
plan may seem old hat, but that is to the president’s credit. Over the 
years, his administration has cobbled together a broad set of policies 
that — along with a shift from coal to natural gas and renewables for 
electricity generation, as well as several years of economic woe — 
have markedly reduced greenhouse-gas emissions, which registered 
almost 7% below 2005 levels in 2011. 

But the United States still has a long way to go ifit is to fulfil its inter- 
national commitment — a 17% reduction by 2020 — and pursue deep 
emissions reductions as the century wears on. Having secured historic 
fuel-economy regulations across the vehicle sector, Obama now has 
the opportunity to lay down an aggressive set of regulations for the 
power sector. It will be up to the EPA, working with states, businesses 
and environmentalists, to determine how to structure the regulations. 
Rather than focusing purely on technological upgrades such as requir- 
ing more efficient boilers, the EPA may be able to improve on broader 
incentives that would require deeper reductions while, for example, 


B efore a major speech on the subject last week, it had been two 


allowing utilities to work with customers to curb electricity demand. 
Obama also hinted that he could deny the proposed Keystone 
pipeline from Alberta to the United States — if the state department's 
ongoing analysis determines that it would significantly exacerbate 
greenhouse-gas emissions. In truth, oil from the tar sands is hardly the 
dirtiest resource from a climate perspective, but it is not the cleanest 
either. And even a cursory review of the local environmental impacts 
suggests plenty of reasons to shift investments towards cleaner alterna- 
tives. Regulating greenhouse-gas emissions 


“Obama urged from the power sector is by far the biggest 
politicians to opportunity, but if the administration feels it 
live up to their can justify a symbolic decision against Key- 
obligations as stone and still move a workable and effective 
caretakers of the regulatory agenda forwards, then so be it. 

future. ” Whatever form the regulations take, and 


however ingeniously the administration can 
work around political opposition, the full scale of the climate chal- 
lenge is more than any president could accomplish independently of 
Congress. Obama urged politicians and public servants to rise above 
the political fray and think beyond the next election, to live up to their 
obligations not just as “custodians of the present, but as caretakers of 
the future”. 

Obama is just six months into his second term, but these are the 
words of a president who no longer needs to worry about re-elec- 
tion. Obama is now thinking about his place in history. Although his 
broader climate agenda has been stymied in Congress, Obama has laid 
out a solid path forward. Now he must follow it through. = 


Russian roulette 


Reforms without consultation will destroy the 
Russian Academy of Sciences. 


of political turmoil in its nearly 300-year history. Yet recent 
decades have not been kind: the academy has been ina state of 
decline since the fall of the Soviet Union in 1991. 

When funding, generous in Soviet times, declined drastically in the 
1990s, too many of the academy’s ageing — and increasingly unpro- 
ductive — members became preoccupied with securing personal 
privileges. Last year, an internal assessment of the academy’s science 
managed to conclude that each of the academy’s 400 institutes performs 
world-class research; typically, no external scientists were consulted. In 
fact, by all measures, only a small fraction of academy institutes can be 


r | Vhe Russian Academy of Sciences has seen and survived its share 
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considered internationally competitive. Many produce only poor sci- 
ence — and outsiders have criticized the organization again and again 
for refusing to accept the dire reality of its situation. 

The problems have not gone unnoticed by the Russian government. 
Tensions between the science ministry and the academy have risen 
in recent years, as the government has become increasingly wor- 
ried about Russian science’s lack of competitiveness. The stand-off 
approached a dramatic climax last week, when a bill was hastily intro- 
duced to the Russian parliament that, if approved, would effectively 
liquidate the academy in its present form. The academy is ill, of that 
there is no doubt. But the proposed cure would kill it off. Worse, the 
billis marked with the worrisome signs of autocracy that characterize 
Russian President Vladimir Putin’s current regime. 

The planned coup would merge the Academy of Sciences with Rus- 
sia’s minor medical and agricultural academies, and would provide all 
members of the united body with equal status as academicians. The 
present academy would lose the right to manage its property and, 
more importantly, would cease to operate research institutes of its own. 
Existing institutes would be evaluated, and those deemed competitive 
would in future be run by a new government agency on behalf of the 
academy. Putin hoped to turn the proposal into law without giving 
the academy time to respond, although the parliament's final vote has 
now been postponed to October. 

The proposal has caused an outcry from Russian scientists. 
Researchers have laid down flowers near the academy’s headquarters 
on Leninski Prospect in Moscow in a symbolic funeral for the institu- 
tion, which was founded in 1724 by Russian Emperor Peter the Great. 

However, it is not the bill’s aim and content that are most troubling, 
but the hasty and profoundly undemocratic manner in which it was 


conceived. Vladimir Fortov, the academy’s newly elected president and 
a reformer who has announced a number of measures to rejuvenate 
and restructure the organization (see Nature 497, 420-421, 2013) was 
not consulted. Neither were the institution's scientific workforce and 

the trade unions. 
Some Western-orientated Russian scientists acknowledge that a 
number of the proposed changes could be beneficial. In effect, the 
reform would create a flexible learned body 


“The academy iS — similar to scientific academies in the United 
ill, of that there States and much of Europe, whose main 
is no doubt. But duties are to provide the government with 
the proposed scientific advice on questions of societal rel- 
cure would kill evance. The task of organizing and funding 


it off.” the research itself would be passed on to a 

new agency — similar to Germany’s Max 
Planck Society — that, if properly run, could provide basic science in 
Russia with much-needed vision and impetus. 

But such sweeping changes require more time and preparation than 
Putin seems willing to grant. An organization that employs more than 
45,000 scientists cannot be successfully transformed overnight. Rus- 
sian scientists have a right to be heard and consulted, and they should 
have been. For the sake of Russian science, members of the parliament 
should refrain from hastily passing an ill-prepared bill; they should 
wait until at least the basic technicalities of what is indeed a much- 
needed reform have been thoroughly worked out and made public. 
The government and the academy should set up an expert committee 
of respected scientists and give it at least 12 months to plan the transi- 
tion. If the result is to be a system that rewards excellence and can give 
solid advice to those in power, then Russia can wait one more year. = 


Presumed consent 


More must be done to boost tissue donation for 
transplantation and research. 


plantation, there remains a crippling shortage of suitable tis- 

sue from willing donors. Actually, make that donors who have 
made it clear that they would be willing. Surveys in Wales, for example, 
have shown that although some two-thirds of people asked say that they 
would be willing to see their heart, liver, lungs and other tissues reused 
after their death, only half of those people go as far as registering their 
consent on the organ-donation register. The resulting shortage, accord- 
ing to Mark Drakeford, the Welsh health minister, means that one per- 
son dies in his country almost every week while waiting for a donor. 

As Nature went to press, the Welsh Assembly was voting on a pro- 
posed change in the rules. It would see Wales reverse the donation 
dynamic — on death, an adult's organs will automatically be consid- 
ered for transplantation, unless that person previously made it clear 
this was against their wishes. A new register would record the names 
of those who do not wish to be classed as donors. 

If passed, the ‘presumed consent’ scheme would come into force in 
2015. Although the family of someone who died without registering 
to opt-out would have no legal right to block use of that person's body 
parts, in practice officials say they would be given the opportunity 
to show that their loved one would not have wanted to donate. This 
‘soft’ scheme is similar to that in operation in Spain. Austria takes a 
stronger line and its ‘hard’ opt-out means that if someone dies without 
registering their dissent, then their organs are considered fair game. 

The vote comes at a time of increasing scrutiny of the way in which 
tissue taken during hospital procedures is used in medical and sci- 
entific research. Last week, Nature told the largely unexplored story 


D espite decades of scientific progress in the field of organ trans- 
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of the WI-38 cell line, derived from a fetus aborted from a woman in 
Sweden (see Nature 498, 422-426; 2013). And Rebecca Skloot’s book 
The Immortal Life of Henrietta Lacks (Crown, 2010), the history of the 
HeLa cell line and the ethical issues it raises, continues to sell. Consent 
— in medicine and science — has become a key issue. 

Italso comes at a time when there remains a critical shortage of some 
tissues for research — the brains of children for example, which are 
needed for work on autism and schizophrenia. Advocates and patient 
groups are already working on ways to confront the biggest obstacle 
— the emotionally fraught conversation with devastated parents who 
have lost a child (see Nature 478, 427; 2011). By talking to the parents of 
children with autism about the benefits of donation, for example, they 
can increase the chances of gaining consent should the worst happen. 

Presumed consent, with the burden placed on people and families to 
opt-out of tissue donation, seems a step too far at present for material 
needed for scientific research. But are the issues involved that different 
from those surrounding transplantation? Both promise better health 
and new life from the waste of death. 

One important motivation when it comes to organ donation is that 
there is little alternative. If someone with a failing organ today does not 
find a willing donor, they may not see tomorrow. That may not always 
be the case. As a News Feature on page 20 investigates, researchers 
are using tissue-engineering techniques to build artificial hearts in 
the laboratory. A Letter published online this week describes the use 
of induced pluripotent stem cells to grow human liver tissue in mice 
(T. Takebe et al. Nature http://dx.doi.org/10.1038/nature12271; 2013). 
And, last month, Japan announced plans to relax a ban on experiments 
that mix human and animal cells, which could be used to generate 
transplantable human organs in pigs. 

For now, such research is of little comfort to those waiting for some- 
one else to die. The planned change in Wales 
goes some way towards making the bodies of the 
deceased more widely available. And it shows 
that, given the chance, the kindness of strangers, 
as well as their consent, can be presumed. = 
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modern biology, every week is big-data week: life-sciences 

research now routinely churns out more information than 
scientists can analyse without help. That help increasingly comes in 
the form of expensive data-management systems, but these are hard 
to design and most are even harder to use. As a result, a long line of 
data-management projects in the life sciences — many of which I have 
been involved with — have failed. 

The size, complexity and heterogeneity of the data generated in 
labs across the world can only increase, and the introduction of cloud 
computing will encourage the same mistakes. Just a stone's throw from 
where I work, at least three computer companies are already tout- 
ing cloud-based data-management systems for the life sciences. We 
need to find ways to manage and integrate data 
to make discoveries in fields such as genomics, 
and we need to do this quickly. 

At their most basic, data-management systems 
allow people to organize and share information. 
In the case of small amounts of uniform data 
from a single experiment, this can be done with 
a spreadsheet. But with multiple experiments 
that produce diverse data — on gene expression, 
metabolites and protein abundance, for example 
— we need something more sophisticated. 

An ideal data-management system would store 
data, provide common and secure access meth- 
ods, and allow for linking, annotation and a way to 
query and retrieve information. It would be able to 
cope with data in different locations — on remote 
servers, on desktops, in a database or spread across 
different machines — and formats, including 
spreadsheets, badly named files, blogs or even scanned-in notebooks. 

That ideal system does not exist. Most academic organizations have, 
through trial and error, developed their own in-house systems that 
work — or just about. The systems have limited functionality and can- 
not be connected, which makes collaboration difficult. The situation 
is as unworkable as if every lab in the country had decided to devise 
its own (poor) document-editing software. 

Efforts to introduce overarching data-management systems, to 
which any and all scientists in a particular field could plug in, have 
failed for two main reasons. Either they demand that scientists 
change the format of their data, to allow information to be entered 
into the system, or they demand that scientists change the way they 
work, to generate standardized sets of results. The systems are thrust 
on scientists who are then expected to change, 


To last week of April was designated Big Data Week. But in 


rather than taking the work of scientists asa DNATURE.COM 
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Biology must develop its 
own big-data systems 


Too many data-management projects fail because they ignore the changing 
nature of life- sciences data, argues John Boyle. 


These problems are exemplified by the expensive flop that was the 
US National Cancer Institute's caBIG data-integration project, scrapped 
last year after almost a decade and tens or even hundreds of millions 
of dollars. It had admirable goals and seemed workable in theory, but 
in the end it was too complicated to use. Crucially, caBIG relied on 
standardized data formats, which called for standardized experiments. 
Its one-size-fits-all approach fit nearly nobody. 

There have been some successes. A widely used system called SRS 
allows the linking of data held in separate well-structured reposito- 
ries. And the Biomart project joins up specially designed databases. 
But these were both fairly bespoke research applications; computer 
giants Microsoft and IBM are among the commercial firms that 
have introduced systems that aimed at a wider reach but had little 
impact. 

To be useful to the life-sciences community, a 
data-management system probably needs to be 
devised and developed by the life-sciences com- 
munity. The US National Institutes of Health 
has a ‘Big Data’ initiative, and agency head Fran- 
cis Collins has spoken many times of the need 
to address the problem. Now is the time for 
researchers to plan an open data-management 
system that scientists will want to adopt. Many of 
the software pieces are already available. 

Asa starting point, here are three lessons from 
the successes and failures of the past. 

First, the data are going to change. Biological 
information will always come in varied formats, 
and these formats cannot be defined in advance. 
Software engineers hate this. But a useful system 
must be flexible and updatable. 

Second, people are not going to change. Busy scientists will adopt 
anew system only if it offers substantial benefit and is painless. Many 
commercial systems are unpopular because they make simple steps 
such as data retrieval complicated, to stop scientists using several 
(rival) systems at once. 

Third, the problem is not technical. Although the latest kit is always 
alluring to funders, today’s cutting-edge devices will be blunt tomor- 
row. Data-management systems must be driven by the need to finda 
workable solution to the problem, not by a desire to make the problem 
fit the latest fashionable technology. 

Development of a biology-friendly system is possible, but it will 
require a change in mentality. As a useful test, a good data-manage- 
ment system should cost more to maintain, update and change with 
the times than it does to develop. Otherwise the price is too high. = 


John Boyle will shortly become senior director of bioinformatics at 
Kymab in Cambridge, UK. 
e-mail: john.boyle@kymab.com 
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A window into 
nerve repair 


Some neurons regenerate 
better than others. 

Researchers led by Vincenzo 
De Paola at Imperial College 
London severed nerve cells 
in mouse brains, using lasers 
to minimize scarring and 
inflammation. The authors 
set glass panes into the skulls 
of the animals and monitored 
regrowth in more than 
100 neurons for up to a year. 
More than half of the cut 
neurons from the deepest layer 
of the brain’s cortex regrew, but 
only about one-fifth of those 
in its other layers did. Neurons 
in the brains of juvenile mice 
were also more likely to regrow 
than those in adult brains. 

Regrowth depends, at 
least in part, on the neurons 
themselves and not just 
external factors such as neural- 
support cells, the authors say. 
They suggest that long-term 
imaging could be used to 
test potential neuron-repair 
strategies in the brains of living 
animals. 

Nature Commun. 4, 2038 (2013) 


Ratchet action 
misshapes pearls 


Perfectly round pearls 
(pictured) owe their 
spherical shape to spiral 


Selections from the 
scientific literature 


Familiar nest sites beat better lakes 


When common loons (Gavia immer, pictured) 
settle down to breed, they pick sites similar to 
the ones they hatched in, even if better sites are 


available. 


As part ofa 20-year study, researchers led by 
Walter Piper at Chapman University in Orange, 
California, tagged and observed birds across 
glacial lakes in the north-central United States. 
Loons that were reared on small, acidic lakes 
tended to settle on similar sites, even though 


growth patterns of nacre, 
the iridescent material also 
known as mother of pearl. 
By contrast, non-spherical 
pearls such as drop pearls 
have longitudinal growth 
fronts positioned such that 
they work like teeth ona 
ratchet, spinning the gem as 
it grows in an oyster. 

Julyan Cartwright of the 
University of Granada, Spain, 
and his team calculated 
the forces exerted by nacre 
particles sticking to and 
bouncing off the growth 
fronts of a developing pearl. 
The forces proved strong 
enough to rotate the pearl 
once every 20 days (the 
speed at which pearls have 
previously been found to 


8 | NATURE | VOL 499 | 4 JULY 2013 


large, less-acidic lakes can support more and 
healthier chicks. The researchers suggest that 
adult loons might survive best on lakes that offer 


the types of fish and other prey that the birds 


rotate) and to influence its 
ultimate shape. 
Microscopic control over 
macroscopic motion could 
bea useful design principle 
for building tiny machines, 
the researchers suggest. 
Langmuir http://dx.doi. 
org/10.1021/la4014202 (2013) 


Drug outdoes 
standard therapy 


A large clinical trial has 
confirmed the promise of 
a targeted drug therapy in 
advanced non-small-cell 
lung cancer. 

The drug crizotinib, which 
targets an oncogenic protein 


© 2013 Macmillan Publishers Limited. All rights reserved 


are most familiar with. A trade-off between 
reproductive success and survival rate could 
help to explain the apparently maladaptive 
habitat choices seen in loons and other species, 
the authors say. 

Proc. R. Soc. B 280, 20130979 (2013) 


encoded by the mutated ALK 
gene, extended progression- 
free survival in patients 
with ALK mutations by 
7.7 months, compared with 
3 months for chemotherapy 
alone. The results from the 
trial, which included 347 
patients, are reported by 
Alice Shaw at Massachusetts 
General Hospital in Boston 
and her colleagues, and 
come just six years after the 
discovery of ALK fusion 
mutations in cancer and 
two years after the drug was 
approved for non-small-cell 
lung carcinoma in the United 
States on the basis of smaller 
clinical trials. 

A related paper from 
a team also led by Shaw 


IGNACIO YUFERA/FLPA 


AMERICAN CHEMICAL SOCIETY 


RICHARD MITHEN 


R. SOC. 


reports a new mechanism 

of resistance to crizotinib in 
one patient, showing that the 
search for effective targeted 
treatments must continue. 

N. Engl. J. Med. 368, 2385-2394; 
368, 2395-2401 (2013) 


PALAEONTOLOGY 


Ancient ‘starfish’ 
had a helix 


Five rays twisting down 
from the top of a fossil hint 
at how creatures such as 
starfish gained their unusual 
symmetry. 

Starfish, sea urchins 
and all other known 
living echinoderms havea 
symmetry that allows them 
to be sliced into five identical 
parts, but some of their 
counterparts in the Cambrian 
period, which began about 
540 million years ago, were 
asymmetric or had bilateral 
symmetry. 

Andrew Smith at the 
Natural History Museum in 
London and Samuel Zamora 
at the Smithsonian Institution 
in Washington DC discovered 
Cambrian fossils in Morocco 
that show what stages 
intermediate to the body plan 
of living echinoderms might 
have looked like. 

Helicocystis moroccoensis 
(pictured) is the oldest 
known echinoderm with five- 
part symmetry; it resembled 
an egg with its tapered end 
planted in the sea floor. Its 
mouth opened upward and its 
body spiralled down. 

Proc. R. Soc. B 280, 20131197 
(2013) 


ZOOLOGY 


Hot sex for 
jawless fish 


After dancing seductively for 
their potential mates, male 
sea lampreys (Petromyzon 
marinus) crank up the heat, 
literally, using a ridge of tissue 
on their backs. 

Courtship behaviour 
of lampreys — eel-like, 
bloodsucking, jawless fishes 
— includes the male rubbing 
his ridge against the belly of an 
interested female. Researchers 
had assumed that this simply 
aroused females mechanically, 
but when Weiming Li and his 
colleagues at Michigan State 
University in East Lansing 
dissected the tissue, they found 
that ridges from mature males 
were full of cells packed with oil 
droplets and cells primed for 
energy production, a hallmark 
of heat-producing tissue. The 
ridge temperature in males 
jumped by up to 0.3°C in the 
presence of sexually mature 
females. 

The authors say that the 
ridge is the first example of a 
heat-generating sexual trait. 

J. Exp. Biol. 216, 2702-2712 
(2013) 


| _GEQSCIENCE 
Earthquakes sink 
volcanoes 


Giant earthquakes in 
subduction zones do not just 
create tsunamis — they can also 
cause nearby volcanic regions 
to sink, possibly altering the 
risk of eruptions. 

In subduction zones, one 
plate of Earth’s crust plunges 
beneath another. Quakes 
cause the overriding plate to 
expand and subside. Volcanoes 
on these plates subside even 
further, according to satellite 
radar data from two regions. 

Youichiro Takada and 
Yo Fukushima at Kyoto 
University, Japan, measured 
drops in volcanic regions of 
up to 15 centimetres near 
the fault that broke in the 
2011 magnitude-9.0 Tohoku 
earthquake. Separately, 

Matt Pritchard at Cornell 
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COMMUNITY 


The most viewed 
papers in science 


po CROPSCIENCES 
Super-broccoli secret solved 


€ HIGHLY READ ‘ single gene is probably responsible for high 

on wiley.com evels of sulphur-containing compounds in 

in May new commercial varieties of broccoli. 

Richard Mithen at the Institute of Food 

Research in Norwich, UK, and his group analysed hundreds 
of genetic markers in broccoli hybrids (pictured) bred to 
produce more glucoraphanin, a compound with reported 
health benefits. The team had previously created the three 
hybrid lines by crossing common broccoli (Brassica oleracea) 
and a wild Sicilian cousin (Brassica villosa) multiple times. The 
analysis showed that the hybrids had all inherited a version of 
a gene from B. villosa. The gene, called Myb28, also regulates 
glucoraphanin production in the model plant Arabidopsis. 
Field trials under diverse conditions showed that the hybrids 


CHOICE 


consistently had higher levels of the compound. The plants 
both drew more sulphur-containing building blocks from 


the soil and shunted a greater 
portion of them towards 


glucoraphanin production. The 
work paves the way for blinded 


human studies that assess the 
health benefits of eating the 
glucoraphanin-rich broccoli, 
the authors say. 

New Phytol. 198, 1085-1095 
(2013) 


University in Ithaca, New 
York, and his colleagues 
measured subsidence of up to 
15 centimetres within weeks 
of the 2010 magnitude-8.8 
Maule earthquake off the coast 
of Chile. 

The authors of the Japanese 
study suggest that the 
subsidence occurred because 
reservoirs of magma below the 
volcanoes sank. By contrast, the 
authors of the Chilean study say 
that hydrothermal reservoirs 
may have drained, causing the 
ground above to collapse. 
Nature Geosci. http://dx.doi. 
org/10.1038/ngeo1857; http:// 
dx.doi.org/10.1038/ngeo1855 
(2013) 


} AGEING 
Clock blocked 
by age 


A protein linked to ageing and 
metabolic disease might control 
the brain’s internal clock. 


The protein SIRT1 regulates 
the expression of many 
genes and has been linked to 
daily biological cycles called 
circadian rhythms in tissues 
such as fat and the liver. Hung- 
Chun Chang and Leonard 
Guarente at the Massachusetts 
Institute of Technology in 
Cambridge found that in 
mouse brains, SIRT'1 switches 
on two proteins that are known 
to regulate circadian rhythms. 

Aged mice were slower 
than young mice to adjust to 
shifts in light-dark cycles, and 
expressed lower levels of SIRT1 
in the brain region that sets 
circadian rhythms. Boosting 
SIRT1 levels shortened 
animals’ adjustment time, 
whereas depleting SIRT1 
lengthened it. 
Cell 153, 1448-1460 (2013) 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature,com/latestresearch 


4 JULY 2013 | VOL 499 | NATURE | 9 


© 2013 Macmillan Publishers Limited. All rights reserved 


SEVEN DAYS nescnnss 


Solar mission 


NASA’ latest solar mission 
reached orbit safely on 

27 June. The Interface Region 
Imaging Spectrograph was 


released by an Orbital Sciences 


Pegasus XL rocket, which was 
launched from Vandenberg 
Air Force Base in Lompoc, 
California. The US$181- 
million spacecraft carries 

a 20-centimetre ultraviolet 
telescope and spectrograph, 
designed to probe the layers 
of the Sun between its 
bright surface and its outer 
atmosphere, or corona (see 
Nature 498, 279-280; 2013). 


DNA transplants 

On 28 June, the UK 
government announced that it 
will publish draft regulations 
later this year with a view 

to allowing and governing 
DNA transplants in in vitro 
fertilization that could prevent 
certain heritable diseases. 

The United Kingdom may 
become the first country to 
legalize the technique, which 
involves transplanting nuclear 
DNA from eggs or embryos 
with faulty mitochondria 

into healthy donor cells. 

The regulations will be open 
to public consultation and 


debated by parliament in 2014. 


Clinical-trial ethics 


US regulators announced plans 
on 26 June for a public meeting 
to discuss ethical issues in 
studies of ‘standard of care’ 
treatments — those commonly 
used in clinical practice. In 
March, the Office for Human 
Research Protections (OHRP) 
criticized a study in infants 

— the Surfactant, Positive 
Pressure, and Oxygenation 
Randomized Trial (SUPPORT) 
— for failing to adequately 
disclose risks associated with 
different blood-oxygen- 


Alaskan volcano eruption escalates 


An ongoing eruption of Alaska’s Pavlof volcano 
intensified on 25 June, when it spewed an 

ash plume up to 8.5 kilometres high. Located 
1,000 kilometres southwest of Anchorage, 
Pavlof is one of the state's most active volcanoes. 
It began erupting in mid-May (pictured on 

18 May), and the Alaska Volcano Observatory 
is watching it closely because of its potential 


impact on aeroplane flights across the North 
Pacific. But four of the nine seismic stations 
that monitor Pavlof have stopped working in 
recent years, and budget cuts have prevented 
the observatory from repairing them. Funding 
cutbacks have halted real-time monitoring of at 
least four of Alaska’s volcanoes. See go.nature. 
com/at8eue for more. 


saturation levels used to 
support extremely premature 
babies. But the criticism stirred 
controversy among researchers, 
ethicists and National Institutes 
of Health officials, prompting 
the OHRP to schedule a 

28 August meeting. 


African power 


Over the next five years, the 
US government will invest 
US$7 billion in an initiative to 
double access to electricity in 
sub-Saharan Africa, President 
Barack Obama announced 
on 30 June. It is estimated 
that more than two-thirds 

of people in the region lack 
electricity. The United States 
will initially work with six 
countries — Ethiopia, Ghana, 
Kenya, Liberia, Nigeria and 
Tanzania — to increase 
generation capacity by more 
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than 10,000 megawatts. The 
project, called Power Africa, 
also includes $9 billion in 
contributions from industry 
partners around the world. 


Russian reform 


The Russian Academy of 
Sciences, Russia's main basic- 
research organization, is facing 
the most radical overhaul 

in its 290-year history. A 
government bill launched 

on 28 June sets out a plan to 
merge the academy with two 
minor academies for medicine 
and agriculture. Responsibility 
for its more than 400 research 
institutes would be transferred 
to a new government agency. 
The Russian parliament’s final 
vote on the bill is expected 

in October. See page 5 and 
go.nature.com/be5pyw 

for more. 
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Obama on climate 


Faced with continued political 
gridlock on climate policy, US 
President Barack Obama has 
ordered the Environmental 
Protection Agency (EPA) 

to regulate carbon dioxide 
emissions from existing power 
plants. The Clean Air Act 
regulations are the centrepiece 
of a broader climate strategy 
unveiled on 25 June and will 
be developed over the next 
two years. The president also 
ordered the EPA to complete 
work on an existing regulatory 
proposal covering new power 
plants. See page 5 for more. 


HIV treatment 


Two men with HIV may be 
on the road to being cured, 
their doctors said on 3 July at 


NASA 


PERSBUREAU VAN EIJNDHOVEN 


SOURCE: NIH 


a meeting of the International 
AIDS Society in Kuala 
Lumpur, Malaysia. The men 
received stem-cell transplants 
to treat blood cancer, 

then stopped taking their 
antiretroviral medications, 
yet have no detectable trace 
of HIV DNA or RNA in their 
blood. It is still too early to 
say whether the men may be 
the third and fourth people 

to be essentially cured of HIV 
(see go.nature.com/2ka1lq). 
Also at the meeting, the 
World Health Organization 
said that HIV patients should 
begin antiretroviral treatment 
earlier than previously 
recommended, while their 
immune systems are still 
relatively strong. See go.nature. 
com/xchc4b for more. 


Chimp conclusion 
The US National Institutes of 
Health (NIH) will retire more 
than 300 research chimpanzees 
to sanctuaries over the next 
several years. No more than 

50 animals will be available 

for future studies, which must 
continue to meet stringent 
ethical and regulatory 
standards. The NIH’s 26 June 
announcement acts on a 2011 
report by the US Institute of 
Medicine, which declared 
most NIH-funded chimpanzee 
research scientifically 
unnecessary. The United States 
is the only major country that 
conducts invasive chimpanzee 
research, and the NIH provides 


TREND WATCH 


Researchers are submitting 
skyrocketing numbers of 
manuscripts for processing by 
PubMed Central, the freely 
accessible repository of the US 
National Institutes of Health 


virtually all US federal funding 
for such work. See go.nature. 
com/1nbérr for more. 


PEOPLE 


we 


No trial for Stapel 
Dutch social psychologist 
Diederik Stapel (pictured), 
who in 2011 was found to 
have fabricated data in at least 
30 published papers, will not 
face trial for misappropriating 
government research 

funds. Instead, in a pre-trial 
settlement, he has agreed 

to undertake 120 hours 

of community service. 

The Netherlands’ public 
prosecutor's office said on 

28 June that the public grants 
were not misused, as the money 
was mainly used to pay staff 
for their work — even though 
that work was based, in part, on 
fabricated data. See go.nature. 
com/zcquw68 for more. 


Commerce head 
Billionaire business executive 
Penny Pritzker was confirmed 
by the US Senate on 25 June 


as the new US secretary 

of commerce. Her job will 
include overseeing the US$5.3- 
billion National Oceanic and 
Atmospheric Administration 
(NOAA), which accounted for 
nearly 66% of the commerce 
department’s 2013 budget. 
Pritzker replaces John Bryson, 
who resigned in June 2012. 
The top job at NOAA remains 
open, however, after marine 
ecologist Jane Lubchenco left 
the agency in February. 


| FUNDING 
Horizon 2020 


European Union (EU) member 
states and the European 
Parliament agreed last week 

on details for Horizon 2020, 

an EU-wide research initiative 
set to begin in January 2014. 
The deal, which must still be 
formally approved, includes 
ahighly simplified funding 
model for all participants 

in the 7-year, €70-billion 
(US$91.2-billion) programme. 
Universities, research institutes 
and companies will be paid the 
full direct project costs, plus a 
25% flat rate to cover overhead 
expenses. See page 18 for more. 


Conservation aid 


Boosting international 

aid to the 40 countries 

where conservation is most 
underfunded could help 

to protect one-third of the 
world’s threatened mammals, 
according to a report released 


PUBLIC-ACCESS SURGE FOR NIH 


The US National Institutes of Health has seen a sharp rise in 
the number of manuscripts submitted to its PubMed Central 


(free access) database. 


NIH says it will 
enforce mandate 
from ‘spring 2013’ IM 


(NIH). PubMed has received an 
average of 8,800 manuscripts per 
month this year, up from 5,100 
per month in 2011 and 2012. Last 
November, the NIH said that, 
from spring 2013, it would more 
rigorously enforce its policy of 
requiring NIH-funded research 
to be freely accessible to the public 
within 12 months of publication. 


NIH public-access 
f mandate receives 
congressional backing 


Manuscripts submitted for 
public access (thousands) 


2006 2007 2008 2009 2010 2011 2012 2013 


SEVEN DAYS | THIS WEEK | 


2-9 JULY 

Researchers discuss 
cosmic-ray physics, 
neutrino astronomy and 
dark-matter physics at 
the 33rd International 
Cosmic Ray Conference 
in Rio de Janeiro, Brazil. 
go.nature.com/ipljwn 


9-11 JULY 

Imperial College in 
London hosts SB6.0, 
the Sixth International 
Meeting on Synthetic 
Biology, where topics 
include biosecurity 
risks and applications to 
human health. 
go.nature.com/w8dokr 


on 1 July (A. Waldron et al. 
Proc. Natl Acad. Sci. USA 
http://dx.doi.org/10.1073/ 
pnas.1221370110; 2013). The 
study created a global database 
of annual conservation 
spending, and found that 
funding correlates with a 
country’s land area, gross 
domestic product and threats 
to biodiversity. The list of 
underfunded countries 
includes Iraq, Senegal and 
France; such knowledge could 
inform international spending 
to prevent species loss, the 
authors say. 


UK funding 

The UK science budget, which 
has been frozen at an annual 
£4.6 billion (US$7 billion) 
since 2010, will not go up in 
the 2015-16 financial year, the 
government said on 26 June. 
But spending on infrastructure 
such as research facilities and 
buildings will increase from 
£0.6 billion to £1.1 billion, and 
will rise in line with inflation 
until 2020-21. The budget also 
gives an extra £185 million to 
the Technology Strategy Board, 
which funds business-led 
research projects. See go.nature. 
com/popd9a for more. 
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Bioethanol-fuelled cars are rarer in Barapa than those that run on biodiesel, although ethanol is greener. 


EU debates U-turn 
on biofuels policy 


Key vote could signal withdrawal of support from biodiesel. 


BY RICHARD VAN NOORDEN 


he European Union (EU) has spent the 
| past 10 years nurturing a €15-billion 
(US$20-billion) industry that makes 
transport fuel from food crops such as soya 
beans and sugar cane in the hope of cutting 
greenhouse-gas emissions. Yet for more than 
halfa decade, scientists have warned that many 
food-based fuels might actually be boosting 
emissions relative to fossil fuels. 
Now the EU could change course by setting 
a cap on the use of food-based biofuel, but 
pressure from industry, farming and energy 
lobbies threatens to limit the reversal. Tensions 
are rising over how much of the emerging sci- 
ence on biofuel emissions will be included in 
EU policy ahead ofa vote on 10 July by the key 


European Parliament committee dealing with 
the legislation. 

Europe began mandating the development 
and use of biofuels in 2003. The two latest 
laws on the subject, passed in 2009, require a 
6% drop in the carbon footprint of transport 
fuel by 2020, by which time renewable energy 
must fuel 10% of the transport sector. Biofuel 
counts towards that requirement if it produces 
a 35% emissions saving over fossil fuels, or 50% 
from 2017 onwards; so far, most of that fuel 
has come from food crops, helping to generate 
a thriving biofuels industry based mainly on 
biodiesel. Europe is even importing rapeseed 
and vegetable oil to meet demand. 

But the original accounting for biofuel emis- 
sions was all wrong, as Tim Searchinger, who 
studies environmental economics at Princeton 
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University in New Jersey, noted in an influential 
2008 article (T. Searchinger et al. Science 319, 
1238-1240; 2008). He and his colleagues found 
that when agricultural land is used to plant bio- 
fuel crops, fresh land may be ploughed up to 
accommodate the existing crops that have been 
edged out. Ultimately, that may drive clear- 
ing of forests, peatlands and wetlands rich in 
sequestered carbon — causing large emissions 
of carbon dioxide. “It’s kind of obvious if you 
think about it,” says Searchinger. 

Calculating this ‘indirect land-use change’ 
(ILUC) effect is complicated, because it 
is based on economic models projecting 
behaviour 10 or 20 years into the future. The 
numbers are different for different crops (see 
‘Carbon conundrum’). But overall, when 
land-use effects are taken into account, most 
varieties of biodiesel turn out to produce more 
emissions than bioethanol — and often more 
than fossil fuels. 

The effect wipes out more than two-thirds 
of the carbon emissions that Europe’s renew- 
able-energy policy was supposed to save by 
2020, says David Laborde, a researcher at the 
International Food Policy Research Institute 
(IFPRI) in Washington DC, which has pro- 
duced influential reports for the European 
Commission. 

In the United States, the Environmental 
Protection Agency did take the land-use effect 
into account in 2010, when it set standards for 
which fuels count as renewable. Luckily for 
US farmers, ethanol from maize (corn) — the 
main biofuel for US vehicles — was given the 
green light under the agency’s rules. 

But the European Commission has ducked 
the issue in the face of strong resistance from 
the biofuels industry and Europe’s energy and 
agricultural sectors. 

In October 2012, the commission finally 
proposed that food-crop fuel quotas be capped 
at only 5% of transport fuel by 2020 — half of 
the 10% renewables target — effectively allow- 
ing existing facilities to continue recouping 
investment, but stopping further expansion. 

“T think they got it exactly right: the answer 
is to stop,” says Searchinger. Under the pro- 
posal, land-use figures would not be used 
to select one biofuel over another. But fuel 
suppliers would have to start including land- 
use figures produced by the IFPRI when 
they report the total emissions of their fuels, 
a hint that the official carbon footprint of 
Europe’s transport fuel might eventually > 
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incorporate that science. 

The European Parliament now gets to 
battle over the commission's proposals. 
On 20 June, its energy committee voted to 
push the cap on food-crop fuels up slightly, 
to 6.5%. It also removed the stipulation that 
fuel suppliers report emissions using land- 
use change figures. Instead, the committee 
proposed gradually increasing mandates 
for use of advanced biofuels not made from 
food crops. 

“The science of ILUC is not robust 
enough for policy,’ argues Clare Wenner, 
head of renewable transport policy at the 
UK Renewable Energy Association in Lon- 
don. But Europe’s Joint Research Centre in 
Brussels says that the models used to calcu- 
late the land-use numbers are no less cer- 
tain than the accepted science on the direct 
emissions of biofuels — and urges that they 
be included. The environment committee 
will vote on its preferred policy on 10 July: 
its lead negotiator on this issue, Corinne 
LePage, agrees with the Joint Research 
Centre and is pushing to incorporate land- 
use change numbers to distinguish between 
better and worse food-crop biofuels. But 
she may not get her way. 

The battle does not end there: the main 
parliament will vote on the issue in Sep- 
tember, based largely on what the environ- 
ment and energy committees recommend. 
Then Europe's energy ministers will have 
to reach a compromise on the legislation. 
Some countries — such as the United 
Kingdom, the Netherlands and Denmark 
— want land-use factors to be included, 
whereas others, including central and east- 
ern European countries with strong biofuel 
lobbies, do not. Although this month's vote 
will lay out the main lines of argument, it 
is conceivable that nothing will be agreed 
until 2014 — when European Parliament 
elections in May could set negotiations 
back to square one. “It’s head-bangingly 
complicated,’ says Wenner. = 


CARBON CONUNDRUM 


Indirect land-use change (ILUC) effects mean 
that some biofuels produce more carbon 
emissions than fossil fuel. 
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After drinking water was tainted in 2011, people in Mianyang, China, had to take emergency measures. 


ENVIRONMENTAL POLICY 


China gears up to 
tackle tainted water 


Government is set to spend 500 million renminbi to clean up 
groundwater polluted by industry and agriculture. 


BY JIAO LI IN BEIJING 


hen rumours swirled earlier this 
year that factories in Weifang, 
China, were discharging waste 


water into the region’s aquifers — the principal 
source of drinking water for the city’s 9 mil- 
lion residents — citizens flocked to the Web 
to register their outrage on microblogging site 
Sina Weibo. The rumours were finally con- 
firmed by officials in late May, further stok- 
ing public fears over an already hot issue: the 
sorry state of the water that so many Chinese 
people drink. 

Now, a massive government investigation 
has documented the scope of the problem in 
northern China, and officials have formulated 
an ambitious plan to tackle it. 

About 18% of the water that China uses 
comes from groundwater, and more than 
400 of the country’s roughly 655 cities have 
no other source of drinking water. Much of 
the groundwater is contaminated, tainted by 
fertilizers, pesticide residues and dirty waste 
water used for irrigation in China’s vast rural 
regions, as well as pollutants from mining, the 
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petrochemical industry, and domestic and 
industrial waste. Heavy metals are especially 
problematic, because “once in the groundwa- 
ter, they don’t go away’, says Sun Ge, a research 
hydrologist at the US Department of Agricul- 
ture’s Forest Service Southern Research Station 
in Raleigh, North Carolina. “It will be very 
expensive to clean up, if it is even possible.” 

In 2006, to assess the scope of the problem, 
the Chinese Ministry of Land and Resources 
launched a 6-year investigation focused on the 
North China Plain, the region most depend- 
ent on groundwater, which is home to nearly 
130 million people. In late April this year, the 
government announced a work plan for con- 
trol of groundwater contamination in the area. 
“The work plan is actually quite remarkable, 
and it is certainly a step in the right direction,” 
says Zheng Yan, who studies groundwater pol- 
lution and public health at Columbia Univer- 
sity in New York. 

The extent of the prob- 
lem is unclear because 
the full results of the 
2006 survey have not 
been made public. An 


For acallto tackle 
China’s water crisis, 
visit: 


CHINA DAILY/REUTERS/CORBIS 


official at the China Geological Survey, which 
commissioned the report, declined to offer 
details for fear of alarming the public. How- 
ever, the government's action plan acknowl- 
edges that the levels of pollution are serious. A 
2012 report by the land ministry found that of 
4,929 groundwater monitoring sites in 198 pre- 
fecture-level administrative regions across the 
country, 41% had poor water quality. Almost 
17% had extremely poor water quality, with 
levels of iron, manganese, fluoride, nitrites, 
nitrates, ammonium and heavy metals exceed- 
ing safe limits. 

Also last year, an article by Zhang Zhaoji, 
a hydrogeologist at the Chinese Academy of 
Geological Sciences’ Institute of Hydrology 
and Environmental Geology in Hebei and pro- 
ject leader for the 2006 survey, reported that 
in the North China Plain, some 35% of shal- 
low groundwater sampling points had been 
contaminated by human activities (Z. Zhang 
et al. J. Jilin Univ. Earth Sci. Edn 42, 1456-1461; 
2012). “Water pollution is a more serious prob- 
lem than the scarcity of water resources,’ says 
Song Xianfang, a hydrologist at the Institute of 
Geographic Sciences and Natural Resources 
Research (IGSNRR) in Beijing, part of the Chi- 
nese Academy of Sciences. 


The contamination rates are “not a sur- 
prise, as China is under rapid urbanization 
and industrialization that bring problems of 
water pollution for both surface and ground- 
water’, says Sun. And, although it is hard to 
prove cause and effect, there will probably be 
fallout for public health, experts say. Govern- 
ment reports stated that in 2004, China had 
38.8 million recorded 


casesoftooth-enamel “Water pollution 
damage owing to isamoreserious 
fluoride exposure; problemthan the 
2.84 million cases of scarcity of water 


bone disease owing resources.” 
to fluoride exposure; 
and 9,686 cases of arsenic poisoning. 

“These diseases are closely related to 
environmental and geological factors [and 
are] especially associated with contaminated 
groundwater,’ says Yang Linsheng, the director 
of the department of environmental geography 
and health at the IGSNRR. The Chinese Center 
for Disease Control and Prevention did not 
respond to Nature’s request for an interview. 

In its plan, the government says that it 
will divide the North China Plain into 30 
units for pollution prevention and control, 
which it will separate into three severity 
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categories — serious, poor and good — to 
be addressed differently. The details, which 
have not been publicly released, include an 
investment of nearly 500 million renminbi 
(US$81 million) between 2013 and 2020 fora 
raft of measures across the country: to increase 
pollution assessments and establish a data- 
base of results; to control river pollution from 
agriculture and point sources from industry 
and landfill; to treat of polluted areas; and 
to conduct more research into clean-up and 
prevention strategies. Among other things, 
researchers will look into the effects of shale- 
gas development on groundwater. 

The plan will also beef up environmen- 
tal regulation. Experts say that will be a key 
measure, because the country must become 
more selective in approving industry projects. 
It must also enhance regulation of polluters, 
especially small rural companies such as paper 
mills. Furthermore, farmers must be educated 
in the proper use of fertilizers. Openness will 
be crucial in gaining public trust, experts add. 
“T would advocate data-sharing and transpar- 
ency in reporting data,” says Zheng Chunmiao, 
director of the Center for Water Research at 
Peking University in Beijing. “Without this, 
people will be anxious.” = 
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EDUCATION 


Evolution makes the grade 


Kansas, Kentucky and other states will also teach climate-change science. 


BY LAUREN MORELLO 


BY DESIGN 


ive US states have adopted 
iz science education standards 

that recommend introduc- 
ing two highly charged topics 
— climate-change science and 
evolution — into classrooms well 
before high school. 

Released in April, the Next Gen- 
eration Science Standards are the 
first effort in 15 years to overhaul 
US science education nationwide. 
Twenty-six states, working with 
non-profit science and education 
groups, developed the guidelines 
on the basis of recommendations 
from the US National Research 
Council. And the measures are being adopted, 
even in states where climate change and evolu- 
tion tend to be avoided in the classroom. 

In the past two months, education officials 
in Rhode Island, Kentucky, Kansas, Maryland 
and Vermont have all approved the standards 
by overwhelming margins. At least five more 
states — California, Florida, Maine, Michigan 
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Number of state bills introduced 


US state legislatures are increasingly introducing ‘academic freedom’ bills to 
allow educators to teach creationism. Since 2008, some of these bills would 
also allow teaching material that promotes climate-change scepticism. 


BB Anti-evolution bills — {) Anti-evolution and anti-climate-change bills 


2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 


and Washington — may take up the standards 
in the next few months. 

“Whew, says Minda Berbeco, programmes 
and policy director at the National Center for 
Science Education in Oakland, California. “So 
far, so good.” Swift adoption of the guidelines 
has been surprising but welcome news for many 
supporters. Evolution has been a controversial 
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topic in US education for decades, 
stretching back to the 1925 ‘monkey 
trial’ in Tennessee, where the state 
prosecuted high-school teacher 
John Scopes for violating a statute 
that barred the teaching of evolu- 
tion. In the past decade, those who 
oppose evolution have sought to 
enact ‘academic freedom’ laws 
that would allow creationism to be 
taught alongside evolution. 

Increasingly, that sort of legisla- 
tion also seeks to promote criticism 
of mainstream climate science (see 
‘By design’). Berbeco says that this 
allows opponents of evolution and 
climate-change education to band 
together. “More people hate evo- 
lution and climate change than just evolution 
alone,” she says. 

Laws passed in Louisiana in 2008 and in 
Tennessee last year allow teachers to present 
material that undermines global warming 
and evolution, two subjects that have been 
specifically singled out in the statutes. Similar 
bills were introduced this year in Arizona, > 
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> Colorado, Kansas and Oklahoma. 

The standards are the first national 
guidelines to incorporate climate change, 
which is already taught in some schools. 
But it has proved daunting for many 
educators, because the subject requires 
teaching aspects of biology, physics and 
chemistry. “It’s a little piece of everything,” 
says Rouwenna Lamm, deputy director 
for national outreach at the Alliance for 
Climate Education in Oakland. The guide- 
lines recommend introducing the subjects 
early on, teaching students in middle school 
that human activities, including the burn- 
ing of fossil fuels, have warmed the planet. 
As students get older, that idea should be 
expanded to encompass discussions of 
climate models and potential policies to 
limit greenhouse-gas emissions. Like- 
wise, the guidelines recommend teaching 
evolution before students reach high- 
school biology classes, the point at which 
many states tackle concepts such as natural 
selection and adaptation. 

The standards have faced legal challenges 
in some states, although the framework 
has so far escaped unscathed. For example, 
Kansas lawmakers last month narrowly 
defeated a measure to block state funding 
to implement the guidelines — quashing 
the proposal just hours before lawmakers 
adjourned for the year. In Kentucky, the 
state board of education unanimously 
approved the standards on 5 June, but they 
must now undergo a public hearing anda 
subsequent legislative review before teach- 
ing can begin. 

That places the guidelines squarely in the 
path of a high-powered critic who will help 
to steer the legislative review: Mike Wilson, 
Republican state senator and chairman of 
the Kentucky Senate’s education committee, 
who is a climate-change sceptic and advo- 
cate of intelligent design. “Political correct- 
ness bears watching and should never be 
the arbiter of learning,” he wrote in a May 
article published in The Courier-Journal, a 
Kentucky newspaper. 

Robert Bevins, a toxicologist and presi- 
dent of Kentuckians for Science Education, 
an advocacy group formed in February in 
part to push for the adoption of the stand- 
ards, says that he is gearing up for a hard 
fight. “Kentucky has a love-hate relationship 
with science,” he says, noting that the state 
has a thriving coal industry that has opposed 
greenhouse-gas regulations and is also home 
to the Creation Museum near Petersburg. 

Richard Innes, an education analyst 
with the conservative Bluegrass Institute 
for Public Policy Solutions in Lexington, 
Kentucky, predicts that the guidelines will 
be sent back to the state education board 
for revision after the public hearing this 
month. But ultimately, he says, “I think the 
science standards will go through” = 


16 | NATURE | VOL 499 | 4 JULY 2013 


Some synthetic fluorescent proteins made by DNA2.0 are now freely available to researchers. 


BIOTECHNOLOGY 


Bioengineers loo 
beyond patents 


Synthetic-biology company pushes open-source models. 


BY HEIDI LEDFORD 


hen DNA2.0, a company that syn- 
thesizes made-to-order genes, 
needed to conduct a few routine 


experiments using a fluorescent protein, its 
lawyers dug up more than 1,000 US patents 
covering their use. DNA2.0 decided to avoid 
the legal thicket by engineering several dozen 
fluorescent proteins from scratch. But the 
company, based in Menlo Park, California, 
was convinced that something had to change. 

Last month, DNA2.0 deposited gene 
sequences encoding three of its fluorescent 
proteins into an open-access collection of 
recipes for DNA ‘parts, molecular building 
blocks used to engineer organisms — often 
bacteria — to carry out specific functions. The 
company vows not to pursue its patent rights 
against anyone using the sequences. 

Such moves are unusual among larger bio- 
technology companies, which tend to guard 
patents fiercely, but for DNA2.0 the choice 
was strategic, says Claes Gustafsson, the firm’s 
chief commercial officer. Synthetic biologists 
aim to bring engineering principles to bear on 
genetic manipulation, and the field’s success 
hinges on the creation of standardized parts 
that can be combined in predictable ways. The 
company wants to create incentives for other 
synthetic-biology firms to design custom 
organisms for which DNA2.0 can synthesize 
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the parts. “We have a lot of customers in small 
biotech companies,” Gustafsson says, “and the 
intellectual-property situation for them is just 
anightmare.” 

Easing that situation will be a key point of 
discussion next week at the Sixth Interna- 
tional Meeting on Synthetic Biology, to be 
held in London by the BioBricks Foundation, 
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Endy, a synthetic biologist at Stanford Univer- 
sity in California. Endy says that synthetic 
biologists are similar to software engineers, 
with the genetic code as their programming 
language. The software industry has favoured. 
open-source approaches and copyright pro- 
tections, because inventions often come faster 
than patents can be acquired. 

The same holds for synthetic biology: if 
every genetic building block comes with a 
patent attached, cellular engineers may end up 
negotiating legal minefields. Few firms will sue 
a scientist who infringes a patent in the course 
of academic research, but young synthetic- 
biology companies are vulnerable. 

Two years ago, the BioBricks Foundation 
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borrowed elements from the open-source 
software movement to develop a public 
agreement for designers of synthetic- 
biology parts. But the 708 parts in the Bio- 
Bricks open-source collection come from 
only three donors: DNA2.0, Endy and 
Ginkgo BioWorks, a synthetic-biology 
company in Boston, Massachusetts. Com- 
mercial use of some of the highest-impact 
parts is still kept under lock and key by 
industry or academic labs. 

Mark Fischer, a copyright lawyer at 
Duane Morris in Boston, and a key archi- 
tect of the BioBricks agreement, says that 
it is too soon to judge the project. He says 
that DNA2.0’s contribution to the registry 
is a sign that the movement is taking off. “T 
think were now at the dawn of that hap- 
pening,’ says Fischer, who also helped to 
pioneer open-source software agreements. 

The open-source push in synthetic biol- 
ogy has also rekindled talk of copyrighting 
engineered DNA sequences. Copyrights 
protect certain types of work from being 
reproduced without permission, but users 
may substantially modify those creations. 
The United States started granting such pro- 
tections to computer programs in the 1960s. 

DNA2.0 plans to find out whether DNA 
sequences can also be shoehorned into the 
framework. Last year, the company peti- 
tioned for US copyright protection of the 
DNA sequence for a fluorescent green pro- 
tein, without success, but has launched an 
appeal. Its plan, says Christopher Holman, 
a law professor at the University of Mis- 
souri-Kansas City who is working with 
DNA2.0, is to pursue the appeal until the 
issue is heard in court. 

Copyrights are cheaper, easier alterna- 
tives to patents, says Endy. They cost $35 to 
register, as opposed to the $100,000 in legal 
fees and administrative costs that DNA2.0 
says it pays for each patent application it 
files. But Endy worries about the duration 
of copyright protections, which can last up 
to 120 years; patents, by contrast, expire 
after 20. 

And patents are still useful for some 
inventions, says Gustafsson. DNA2.0 will 
continue to patent some of its engineered 
genes and proteins. “We play in the same sys- 
tem as everyone else,’ says Gustafsson. “But 
we also want to increase our market size.” = 
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US Senate backs 
immigration plan 


Proposal would lift visa caps for US-trained scientists 


and engineers. 
BY HELEN SHEN 


in 2003 helped to fulfil a long-held ambi- 
tion of pursuing scientific research in the 
United States. In 2009, Basu, a native of India, 
earned his PhD in biomedical sciences from 
Eastern Virginia Medical School in Norfolk. 
But Basu is struggling to keep his American 
dream alive after finishing a postdoctoral fel- 
lowship at Old Dominion University in Norfolk 
in 2011. With his temporary work visa set to 
expire in 2015, he is now working as a consult- 
ant in northern Virginia — and fighting tough 
odds to stay in the United States permanently 
by applying for a coveted but scarce ‘green card. 
Those green cards could soon flow more 
freely to scientists such as Basu. After years of 
debate and many failed attempts, on 27 June the 
US Senate approved a comprehensive immi- 
gration plan that would allow thousands more 
foreign scientists and engineers to remain in 
the United States permanently.“Tt’s a phenom- 
enal improvement over the current situation,” 
says Russell Harrison, a senior legislative rep- 
resentative for IEEE-USA in Washington DC, 
which advocates for US members of the Insti- 
tute of Electrical and Electronics Engineers. 
Under current policy, the number of green 
cards that can be issued each year is limited to 
140,000, a figure that is further reduced by per- 
country caps. Applicants from countries that 
send large numbers of immigrants — such as 
China, India, Mexico and the Philippines — 
must often wait for years, subsisting on a string 
of temporary work visas that can be revoked at 
an employer's discretion. 
“Our system is absolutely, utterly broken,” 
says Amy Scott, associate vice-president 
for federal relations at the Association of 
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Most holders of doctoral degrees who have 
temporary US work visas come from India and 
China. Many of them have trouble securing 
permanent residency. 
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American Universities in Washington DC. 

The Senate bill would end country-based 
caps and exempt researchers in some disci- 
plines from limits altogether. Applicants with 
master’s or doctoral degrees in science, tech- 
nology, engineering or mathematics (STEM) 
obtained from US universities would be 
eligible to tap an unlimited pool of green cards. 
And, unlike previous proposals, the bill brings 
biological and biomedical sciences under the 
STEM umbrella. 

According to the most recent statistics from 
the National Science Foundation, about 25% 
of the US science and engineering workforce 
comes from other countries. People from 
China and India made up nearly half of PhD 
holders who received temporary work visas in 
2009 (see ‘Short stays?’). And many of them 
lead tenuous lives in their adopted country. 

Among them is Somiranjan Ghosh, 
a senior research associate in molecu- 
lar genetics at Howard University in > 
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> Washington DC. Ghosh came to the 
United States from India in 2003 for a 
postdoctoral fellowship at the National 
Cancer Institute in Bethesda, Maryland, and 
finished a second fellowship at Howard Uni- 
versity in 2007. He applied for permanent resi- 
dency in December 2010 and was approved in 
2011, but he has yet to receive his green card. 
Ghosh also wants to travel abroad, but, with- 
out a green card, he could encounter admini- 
strative delays when he tries to re-enter the 
United States. Last year, he turned down an 
invitation to speak at a conference in France. 
Worries about conference travel are a big 
problem for many postdoctoral fellows on tem- 
porary visas, says Benjamin Corb, director of 
public affairs at the American Society for Bio- 
chemistry and Molecular Biology in Rockville, 


Maryland. “They just don't go, so they lose 
out on that opportunity in their professional 
career, he says. That understandable caution 
can also exact heavy personal costs. Ghosh 
was too afraid to return to India to see his 
sister before she died of cancer in January. 

Ghosh’s visa, unlike 
a green card, does not 
allow him to change 
jobs easily. He would 
like to move into the 
field of medical diagnostics and eventually start 
his own company. “I’m 45 now,’ he says. “I want 
to start my own career.’ 

Hopes of clearing the green card logjam now 
rest with the Republican-controlled House of 
Representatives, and its leaders are preparing 
separate proposals to address immigration. 


“Our system 
is absolutely, 
utterly broken.” 


Although increased immigration for scientists 
and engineers enjoys broad bipartisan sup- 
port, Republicans argue that STEM green cards 
should be created only at the expense of other 
categories, such as the annual green card lottery 
for natives of countries that send few immi- 
grants to the United States. But Democrats, 
who control the Senate, reject that notion. Any 
changes to the visa system will require agree- 
ment by both sides on a broader suite of hotly 
contested immigration issues. 

For Basu, the stakes may be higher than for 
most. He and his wife are expecting their first 
child in three weeks, and he worries that the 
family may ultimately have to move back to 
India — away from the life they have created in 
Virginia. “Our kid will be an American citizen,” 
says Basu. “We have roots here.” m 


European deal cuts red tape 


Horizon 2020 research programme streamlines project reimbursements. 


BY QUIRIN SCHIERMEIER 


deal struck last week during negotia- 
A on the research programme for 

the next seven years in the European 
Union (EU) promises a significant change to 
the way in which institutions are reimbursed 
for the overhead costs of their research. The 
agreement for Horizon 2020 sweeps away the 
onerous red tape involved in the present diverse 
arrangements and replaces it with an across- 
the-board 25% reimbursement rate for all. 

Although the deal could be a boon for the 
many European universities with low over- 
heads, which include heating, lighting, rent 
and facilities maintenance, it has disappointed 
some operators of large research facilities, 
mainly those in Western Europe. They warn 
that the simplified funding rules could harm 
top-ranking centres with high overheads 
because they will need reimbursement beyond 
25% of the total direct costs. 

“The new rules threaten to make Horizon 
2020 extremely unattractive, particularly for 
research organizations dedicated to innovation,’ 
says Reimund Neugebauer, president of the 
Fraunhofer Society, headquartered in Munich, 
Germany, which carries out contract research 
for industry. 

Details of Horizon 
2020, due to start next 
January, have been under 
negotiation since Febru- 
ary in a series of talks 
between EU member 
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OVERHEAD HEADACHES 


A hypothetical €1-million grant would net 25% for 
indirect costs under new EU rules, contrasting 
with the variabillity of the previous system. 


m Direct costs m Indirect costs 


New flat rate 


Old full cost 
(low overheads) 
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states, the European Parliament and the Euro- 
pean Commission. The three were united on 
the programme's goal to spur economic growth 
and on its broad themes, which include health 
and energy research. But the parliament and 
member states have been squabbling over what 
accounting rules might best serve Europe's 
paperwork-plagued research community. The 
United States, too, has stumbled over funding 
of indirect costs (see “Transatlantic concerns’), 
but some had feared that the European dead- 
lock over the issue would delay the start of 
Horizon 2020. 

Keen to simplify the affair, the European 
Commission and most member states threw 
their support behind a system that would pay 
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grant-winners the full direct costs of a project, 
such as salaries, travel and laboratory supplies, 
plus a 25% flat rate to cover overheads. Such a 
move would also please the EU’s auditors — in 
a report released on 7 June, they slammed the 
complex funding model used in the organiza- 
tion’s 2007-13 research programme. 

But some Members of the European Par- 
liament (MEPs) — backed by the European 
University Association in Brussels, which 
represents many of Europe’s universities 
and research institutes — held that such an 
approach would make participation in Hori- 
zon 2020 unattractive for institutions with high 
overheads. Universities that run expensive 
facilities, for example ocean-going research 
vessels and synchrotron machines, would be 
left out of pocket, as would organizations such 
as the Fraunhofer, which have high overheads 
because the contract research they carry out 
often involves the use of expensive industry- 
owned research facilities. 

Critics of the flat rate were pushing for the 
‘full cost’ reimbursement model used in the 
last EU research programme. This would 
have allowed organizations to get 75% of their 
direct costs plus 100% of their indirect costs — 
which can sometimes be as high as the direct 
costs (see ‘Overhead headaches’). In the end, 
a majority of member states and the commis- 
sion gained the upper hand in their attempt 
to simplify the accounting. At the meeting 
last week, MEPs reluctantly agreed that Hori- 
zon 2020 would use only the flat-rate model. 
The deal must still be formally approved by 
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Research carried out by Germany’s Fraunhofer Society might be hampered by European funding rules. 


parliament and EU member states, but it is 
expected to pave the way for Horizon 2020 to 
start on schedule. 

“The parliament managed to safeguard many 
improvements and substantial simplification 
for participants,” said Christian Ehler, an MEP 
with the centre-right European People’s Party 
and parliament's lead negotiator for Horizon 
2020, ina statement to Nature. “But I dread the 
fact that the parliament had to consent to the 
council’s funding model” because it will dra- 
matically disadvantage some institutions. 

Nonetheless, some of Europe’ elite research 
universities are pleased with the promised 
reduction in red tape. “Having one rule for all is 
a major improvement,’ says Kurt Deketelaere, 
secretary-general of the League of European 
Research Universities, a partnership of 21 top 
universities. “Imagine the insane complexity 
in collaborating with research organizations 


TRANSATLANTIC CONCERNS 
Flat rate overruled 


The thorny issue of overhead payments 
is not restricted to Europe. In the United 
States, the average reimbursement 

rate is around 50% of direct project 
costs, but top institutes such as Harvard 
University in Cambridge, Massachusetts, 
receive up to 70% of extra money from 
federal grants. Critics say that the current 
practice unfairly favours a few research 
powerhouses over many other, smaller 
universities. However, an attempt last 
year by President Barack Obama’s 
administration to introduce a single flat 
rate met with fierce opposition from 
large institutes such as Harvard and the 
Massachusetts Institute of Technology in 
Cambridge. The plan was abandoned. 0S. 


and companies which all follow different 
rules. That system had to go.” By and large, 
says Deketelaere, universities will be better off 
financially than they were under previous EU 
research programmes. 

But the commission has promised to address 
the concerns of those unhappy with the new 
rules. A recent commission working paper 
seen by Nature proposes that more of the costs 
incurred in operating research facilities could 
be reimbursed if the money were interpreted 
as being fully related to a Horizon 2020 project. 
“We will take the commission at its word,’ says 
Neugebauer. 

Scientists in the 13 states that have joined the 
EU since 2004 could benefit from the changes 
thrashed out last week. Universities and insti- 
tutes there have less experience in dealing with 
EU bureaucracy — a prerequisite for claim- 
ing and verifying overhead costs. Moreover, 
their overheads tend to be smaller than those 
of facilities-rich Western European research 
centres. As a further sweetener, scientists in 
these countries who receive a Horizon 2020 
grant will get an annual salary bonus of €8,000 
(US$10,400). 

The flat-rate system could also help scien- 
tists in such countries to win a bigger slice 
of EU funding, says Krzysztof Frackowiak, 
director of the Polish Science Contact Agency 
in Brussels, which helps Polish institutions to 
negotiate EU red tape. The newer member 
states “haven't been able to get back from Brus- 
sels nearly as much as they paid into European 
research programmes’, he says. m 


CORRECTION 

The y-axis in the graphic ‘The rise of open 
access in the News Feature ‘The true cost of 
science publishing’ (Nature 495, 426-429; 
2013) was mislabelled. The correct version 
is online at go.nature.com/e8rsrb. 
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With thousands of people in need of heart transplants, 
researchers are trying to grow new organs. 


HOW 10 BUILD A 


HEART 


BY BRENDAN MAHER 


oris Taylor doesn’t take it as an insult when people 
call her Dr Frankenstein. “It was actually one of the 
bigger compliments I’ve gotten,” she says — an affir- 
mation that her research is pushing the boundaries of 
the possible. Given the nature of her work as director 
of regenerative medicine research at the Texas Heart 
Institute in Houston, Taylor has to admit that the comparison is apt. She 
regularly harvests organs such as hearts and lungs from the newly dead, 
re-engineers them starting from the cells and attempts to bring them 
back to life in the hope that they might beat or breathe again in the living. 

Taylor is in the vanguard of researchers looking to engineer entire new 
organs, to enable transplants without the risk of rejection by the recipient's 
immune system. The strategy is simple enough in principle. First remove 
all the cells from a dead organ — it does not even have to be from a human 
— then take the protein scaffold left behind and repopulate it with stem 
cells immunologically matched to the patient in need. Voila! The crippling 
shortage of transplantable organs around the world is solved. 

In practice, however, the process is beset with tremendous challenges. 
Researchers have had some success with growing and transplanting 
hollow, relatively simple organs such as tracheas and bladders (see 
go.nature.com/zvuxed). But growing solid organs such as kidneys or 
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lungs means getting dozens of cell types into exactly the right positions, 
and simultaneously growing complete networks of blood vessels to keep 
them alive. The new organs must be sterile, able to grow if the patient 
is young, and at least nominally able to repair themselves. Most impor- 
tantly, they have to work — ideally, for a lifetime. The heart is the third 
most needed organ after the kidney and the liver, with a waiting list of 
about 3,500 in the United States alone, but it poses extra challenges for 
transplantation and bioengineering. The heart must beat constantly to 
pump some 7,000 litres of blood per day without a back-up. It has cham- 
bers and valves constructed from several different types of specialized 
muscle cells called cardiomyocytes. And donor hearts are rare, because 
they are often damaged by disease or resuscitation efforts, so a steady 
supply of bioengineered organs would be welcome. 

Taylor, who led some of the first successful experiments to build rat 
hearts’, is optimistic about this ultimate challenge in tissue engineering. 
“T think it’s eminently doable,” she says, quickly adding, “I don't think 
it’s simple” Some colleagues are less optimistic. Paolo Macchiarini, a 
thoracic surgeon and scientist at the Karolinska Institute in Stockholm, 
who has transplanted bioengineered tracheas into several patients, says 
that although tissue engineering could become routine for replacing 
tubular structures such as windpipes, arteries and oesophagi, he is “not 


© 2013 Macmillan Publishers Limited. All rights reserved 


OTT LAB/MASSACHUSETTS GENERAL HOSPITAL 


confident that this will happen with more complex organs”. 

Yet the effort may be worthwhile even ifit fails, says Alejandro Soto- 
Gutiérrez, a researcher and surgeon at the University of Pittsburgh in 
Pennsylvania. “Besides the dream of making organs for transplantation, 
there are a lot of things we can learn from these systems,” he says — 
including a better basic understanding of cell organization in the heart 
and new ideas about how to fix one. 


THE SCAFFOLD 

For more than a decade, biolo- 
gists have been able to turn 
embryonic stem cells into 
beating heart-muscle cells in 
a dish. With a little electrical 
pacemaking from the outside, 
these engineered heart cells 
even fall into step and maintain 
synchronous beating for hours. 

But getting from twitching 
blobs in a Petri dish to a work- 
ing heart calls for a scaffold 
to organize the cells in three 
dimensions. Researchers may 
ultimately be able to create 
such structures with three- 
dimensional printing — as 
was demonstrated earlier this 
year with an artificial trachea” 
(see Nature http://doi.org/ 
m2q; 2013). For the foreseeable 
future, however, the complex 
structure of the human heart 
is beyond the reach of even the 
most sophisticated machines. 
This is particularly true for the 
intricate networks of capillar- 
ies that must supply the heart 
with oxygen and nutrients and 
remove waste products from 
deep within its tissues. “Vas- 
cularity is the major challenge,’ 
says Anthony Atala, a urologist 
at Wake Forest University in 
Winston-Salem, North Caro- 
lina, who has implanted bio- 
engineered bladders into patients’ and is working on building kidneys 
(see Nature http://doi.org/dw856h; 2006). 

The leading techniques for would-be heart builders generally involve 
reusing what biology has already created. One good place to see how 
this is done is Massachusetts General Hospital in Boston, where Harald 
Ott, a surgeon and regenerative-medicine researcher, demonstrates a 
method that he developed while training under Taylor in the mid 2000s. 

Suspended by plastic tubes in a drum-shaped chamber made of glass 
and plastic is a fresh human heart. Nearby is a pump that is quietly push- 
ing detergent through a tube running into the heart’s aorta. The flow 
forces the aortic valve closed and sends the detergent through the network 
of blood vessels that fed the muscle until its owner died a few days before. 
Over the course of about a week, explains Ott, this flow of detergent will 
strip away lipids, DNA, soluble proteins, sugars and almost all the other 
cellular material from the heart, leaving only a pale mesh of collagen, 
laminins and other structural proteins: the ‘extra- 
cellular matrix’ that once held the organ together. 

The scaffold heart does not have to be human. 
Pigs are promising: they bear all the crucial com- 
ponents of the extracellular matrix, but are unlikely 
to carry human diseases. And their hearts are rarely 
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A decellularized human heart awaits rebuilding with an injection of precursor cells. 
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weakened by illness or resuscitation efforts. “Pig tissues are much safer 
than humans and there's an unlimited supply,” says Stephen Badylak, a 
regenerative-medicine researcher at the University of Pittsburgh. 

The tricky part, Ott says, is to make sure that the detergent dissolves 
just the right amount of material. Strip away too little, and the matrix 
might retain some of the cell-surface molecules that can lead to rejec- 
tion by the recipient's immune system. Strip away too much, and it could 
lose vital proteins and growth factors that tell newly introduced cells 
where to adhere and how to 
behave. “If you can usea milder 
agent anda shorter time frame, 
you get more of a remodelling 
response,’ says Thomas Gilbert, 
who studies decellularization 
at ACell, a company in Colum- 
bia, Maryland, that produces 
extracellular-matrix products 
for regenerative medicine. 

Through trial and error, 
scaling up the concentration, 
timing and pressure of the 
detergents, researchers have 
refined the decellularization 
process on hundreds of hearts 
and other organs. It is prob- 
ably the best-developed stage 
of the organ-generating enter- 
prise, but it is only the first step. 
Next, the scaffold needs to be 
repopulated with human cells. 


THE CELLS 

‘Recellularization’ introduces 
another slew of challenges, 
says Jason Wertheim, a sur- 
geon at Northwestern Uni- 
versity’s Feinberg School of 
Medicine in Chicago, Illinois. 
“One, what cells do we use? 
Two, how many cells do we 
use? And three, should they be 
mature cells, embryonic stem 
cells, iPS [induced pluripotent 
stem] cells? What is the opti- 
mum cell source?” 

Using mature cells is tricky to say the least, says Taylor. “You can’t get 
adult cardiocytes to proliferate,” she says. “If you could, we wouldn't be 
having this conversation at all” — because damaged hearts could repair 
themselves and there would be no need for transplants. 

Most researchers in the field use a mixture of two or more cell types, 
such as endothelial precursor cells to line blood vessels and muscle pro- 
genitors to seed the walls of the chambers. Ott has been deriving these 
from iPS cells — adult cells reprogrammed to an embryonic-stem-cell- 
like state using growth factors — because these can be taken from a 
patient in need and used to make immunologically matched tissues. 

In principle, the iPS-cell approach could provide the new heart with 
its full suite of cell types, including vascular cells and several varieties 
of heart-muscle cell. But in practice, it runs into its own problems. One 
is the sheer size of a human heart. The numbers are seriously under- 
appreciated, says Ott. “It’s one thing to make a million cells; another 
to make 100 million or 50 billion cells.” And researchers do not know 
whether the right cell types will grow when iPS cells are used to reca- 
pitulate embryonic development in an adult heart scaffold. 

As they colonize the scaffold, some of the immature cells will take 
root and begin to grow. But urging them to become functional, beating 
cardiomyocytes requires more than just oxygenated media and growth 
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To construct a new heart, researchers first remove all cells from a donor organ (left), leaving a protein scaffold. That is 


seeded with cells (centre), which mature under the influence of growth factors and mechanical stimulation (right). 


Detergents are 
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aorta, filling the 
arteries that feed 
the heart 
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factors. “Cells sense their environment; says Angela Panoskaltsis- 
Mortari, who has been trying to build lungs for transplant at the Uni- 
versity of Minnesota in Minneapolis. “They don't just sense the factors. 
They sense the stiffness and the mechanical stress,” which in turn pushes 
the cells down their proper developmental path. 

So researchers must put the heart into a bioreactor that mimics the 
sensation of beating. Ott’s bioreactors use a combination of electrical 
signals — akin to a pacemaker — to help to synchronize the beating 
cardiomyocytes seeded on the scaffold, combined with physical beating 
motions induced by a pump (see ‘Customized organs’). But researchers 
face a constant battle in trying to ape the conditions present in the human 
body, such as changes in heart rate and blood pressure, or the presence of 
drugs. “The body reacts to things and changes the conditions so quickly 
it’s probably impossible to mimic that in a bioreactor,’ says Badylak. 

When Taylor and Ott were first developing bioreactors, for decel- 
lullarized and repopulated rat hearts, they had to learn as they went 
along. “There was a lot of duct tape in the lab,’ Ott says. But eventually 
the hearts were able to beat on their own after eight to ten days in the 
bioreactor, producing roughly 2% of the pumping capacity of a nor- 
mal adult rat heart’. Taylor says that she has since got hearts from rats 
and larger mammals to pump with as much as 25% of normal capacity, 
although she has not yet published the data. She and Ott are confident 
that they are on the right path. 


THE BEAT 
The final challenge is one of the hardest: placing a newly grown, engi- 
neered heart into a living animal, and keeping it beating for a long time. 
The integrity of the vasculature is the first barrier. Any naked bit of 
matrix serves as a breeding ground for clots that could be fatal to the 
organ or the animal. “Youre going to need a pretty intact endothelium 
lining every vessel or you're going to have clotting or leakage,’ says Gilbert. 
Ott has demonstrated that engineered organs can survive for a time. 
His group has transplanted a single bioengineered lung into a rat, show- 
ing that it could support gas exchange for the animal, but the airspace 
fairly quickly filled with fluids*. And an engineered rat-kidney trans- 
plant that Ott’s group reported early this year survived without clotting, 
but had only minimal ability to filter urine, probably because the process 
had not produced enough of the cell types needed by the kidney” (see 
Nature http://doi.org/m2r; 2013). Ott’s team and others have implanted 
reconstructed hearts into rats, generally in the neck, in the abdomen or 
alongside the animal’s own heart. But although the researchers can feed 
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the organs with blood and get them to beat fora while, none of the hearts 
has been able to support the blood-pumping function. The researchers 
need to show that a heart has much higher ability to function before they 
can transplant it into an animal bigger than a rat. 

With the heart, says Badylak, “you have to start with something that 
can function pretty well” from the moment the transplant is in place. 
“You can't have something pumping just 1 or 2 or 5% of the ejection 
fraction of the normal heart and expect to make a difference,’ he says, 
referring to a common measure of pumping efficiency. There is little 
room for error. “We're just taking baby steps,” says Panoskaltsis-Mortari. 
“We're where people were with heart transplant decades ago.” 

The decellularization process being cultivated by Ott and others is 
already informing the development of improved tissue-based valves 
and other parts of the heart and other organs. A bioengineered valve, for 
example, may last longer than mechanical or dead-tissue valves because 
they have the potential to grow with a patient and repair themselves. 
And other organs may not need to be replaced entirely. “Id be surprised 
if within the next 5-7 years you don't see the patient implanted with 
at least part of an artery, lobes ofa lung, lobes ofa liver,’ says Badylak. 

Taylor suspects that partial approaches could aid patients with severe 
heart defects such as hypoplastic left heart syndrome, in which half the 
heart is severely underdeveloped. Restoring the other half, “essentially 
forces you to build the majority of the things you need’, she says. 

And these efforts could hold lessons for the development of cell thera- 
pies delivered to the heart. Researchers are learning, for example, how 
heart cells develop and function in three dimensions. In the future, par- 
tial scaffolds, either synthetic or from cadavers, could allow new cells to 
populate damaged areas of hearts and repair them like patches. 

The jars of ghostly floating organs might seem like a gruesome echo 
of the Frankenstein story, but Taylor says her work is a labour of love. 
“There are some days that I go, ‘Oh my god, what have I gotten into?’ 
On the other hand, all it takes is a kid calling you, saying ‘Can you help 
my mother?’ and it makes it all worthwhile.” m SEEEDITORIALP.6 


Brendan Maher is a features editor for Nature based in New York. 
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The International Center for Tropical Agriculture in Colombia holds 65,000 crop samples from 141 countries. 


Feeding the future 


We must mine the biodiversity in seed banks to help to overcome 
food shortages, urge Susan McCouch and colleagues. 


umanity depends on fewer than a 
H dozen of the approximately 300,000 
species of flowering plants for 80% 
of its caloric intake. And we capitalize on 
only a fraction of the genetic diversity that 
resides within each of these species. This is 
not enough to support our food system in 
the future. Food availability must double in 
the next 25 years to keep pace with popula- 
tion and income growth around the world. 
Already, food-production systems are pre- 
carious in the face of intensifying demand, 
climate change, soil degradation and water 
and land shortages. 
Farmers have saved the seeds of hundreds 
of crop species and hundreds of thousands of 
‘primitive’ varieties (local domesticates called 


landraces), as well as the wild relatives of crop 
species and modern varieties no longer in use. 
These are stored in more than 1,700 gene 
banks worldwide. Maintaining the 11 inter- 
national gene-bank collections alone costs 
about US$18 million a year. 

The biodiversity stored in gene banks fuels 
advances in plant breeding, generates billions 
of dollars in profits, and saves many lives. For 
example, crossbreeding a single wild species 
of rice, Oryza nivara, which was found after 
screening more than 6,000 seed-bank acces- 
sions, has provided protection against grassy 
stunt virus disease in almost all tropical rice 
varieties in Asia for the past 36 years’. Dur- 
ing the green revolution, high-yielding rice 
and wheat varieties turned India into a net 
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food exporter. By 1997, the world economy 
had accrued annual benefits of approximately 
$115 billion from the use of crop wild rela- 
tives” as sources of environmental resilience 
and resistance to pests and diseases. 

The time is ripe for an effort to harness the 
full power of biodiversity to feed the world. 
Plant scientists must efficiently and system- 
atically domesticate new crops and increase 
the productivity and sustainability of current 
crop-production systems. 

Why does plant breeding need a boost? 
Because new, high-yielding seeds that 
are adapted for future conditions are a 
cornerstone of sustainable, intensified food 
production’. Since the mid-1990s, progress 
in conventional plant breeding has > 
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> slowed, despite the phenomenal yield 
gains of the past. Part of the reason is that 
only the tip of the biodiversity iceberg has 
been explored and used’. 

Crop wild relatives, landrace varieties 
and previously undomesticated wild species 
represent sources of new variation for agri- 
culture. Such plants have survived repeated 
and extreme environmental challenges, yet 
their resilience and adaptive capacity remain 
largely untapped and poorly understood. 
A wealth of genetic information has been 
left behind throughout the history of plant 
domestication and scientific crop improve- 
ment. It must now be deployed. 

Plant breeders often worry that using 
wild species or landrace varieties is too risky, 
scientifically and economically. It took 
20 years and 34,000 attempts to cross a 
domesticated rice variety with a distantly 
related, highly salt-tolerant wild relative from 
India before fertile offspring were obtained’. 
It will now take at least 4-5 years of breed- 
ing to eliminate unwanted wild characters to 
generate a new high-yielding, salt-tolerant 
rice variety (see go.nature.com/knztl5). That 
is too long for most plant-breeding pro- 
grammes, especially in the private sector. 

Insufficient genetic and phenotypic infor- 
mation about most of the holdings in gene 
banks makes plant breeders even more 
reluctant. Politics has also created obstacles. 
The Convention on Biological Diversity (see 
go.nature.com/njehon) is an international 
treaty that, although vital for consolidat- 
ing efforts to conserve the diversity of life 
on Earth, has created significant barriers to 
the sharing of genetic material, including of 
domesticated plants and their wild relatives’. 

Happily, things are changing. The Interna- 
tional Treaty on Plant Genetic Resources 
for Food and Agriculture® (ITPGRFA), 
negotiated in 2004, now governs access to 
crop diversity. It mandates that a portion 
of any monetary benefits derived from 
the commercialization of products from 
gene-bank materials is put into a fund that 
supports conservation and sustainable use 
of crop genetic resources. 

On the technical front, we are now able 
use a plant’s genetic make-up to predict its 
agronomic potential and traits. Plant breed- 
ers commonly use genetic markers to identify 
individual plants carrying specific genes for 
disease and pest resistance or stress tolerance, 
without ever exposing the plants to the rel- 
evant agents. Breeders can use genome-wide 
approaches to eliminate 70-80% of individu- 
als in any generation without having to invest 
in laborious multi-environment field testing. 


THREE STEPS 

How should we begin to mine biodiversity 
for food security? A logical first step is to 
obtain a sample of sequence information 
from the genomes of all non-duplicate plant 
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samples in the world’s gene banks that are 
available under the terms and conditions 
of the ITPGRFA — perhaps up to 2 million. 
This ‘fingerprint for each plant will serve as 
the basis for assessing genetic relationships, 
and will make it possible to systematically 
select subsets of material for in-depth inves- 
tigations. Sequencing costs are plummeting, 
making such an effort feasible. 

Sequence data provide a genomic ‘parts 
list’ that can help to decipher mechanisms 
that enable plants to adapt to myriad envi- 

ronments, and can 


“Theresearch — guide ourremodelling 
community of cropping systems 
must pay for the future. Link- 
attentiontothe ing sequence data 
development with conventional 
of locally ‘passport informa- 
adapted tion about collection 


locality and original 
environment should 
call attention to the genetic potential of 
many hidden crop resources. 

Second, we must analyse the phenotypes of 
gene-bank accessions to evaluate their traits 
and overall performance. This is the most 
intellectually challenging, complex, costly 
and time-consuming stage. We cannot hope 
to evaluate all gene-bank accessions in all 
relevant environments, even with the advent 
of high-throughput phenotyping technolo- 
gies. Using sequence data in combination 
with phenotypic, geographical and ecological 
information will enable researchers to target 
field experiments strategically and to develop 
models that can predict plant performance. 
This will make plant breeding faster, more 
efficient and cheaper. 

Assessing the breeding potential of unfa- 
miliar plant materials typically requires them 
to be crossed with modern, ‘lite’ varieties. 
Their offspring are then evaluated in envi- 
ronments of interest to farmers and breeders, 
over several years. Often, the genetics of high- 
performing offspring can be traced back to 
DNA inherited from wild or landrace donors 
that are agriculturally less productive. For 
example, the wild tomato species Solanum 
pennellii was used to double commercial 
tomato yields under a wide range of growing 
conditions’, and the wild rice species Oryza 
rufipogon increased yields of elite varieties of 
rice by more than 25% (ref. 8). Thus, useful 
genetic traits are moved across the breed- 
ing barrier, expanding the genetic diversity 
of domesticated plants and opening up new 
opportunities for environmental resilience 
and future gains in quality and yield’. 

A third key step is to create an internation- 
ally accessible informatics infrastructure to 
catalogue the diversity in the world’s seed 
collections. This would link seeds and genetic 
stocks directly to passport, genomic and 
phenotypic information’, thereby engag- 
ing the creativity of geneticists and breeders 


varieties.” 
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and fuelling plant improvement for years to 
come. This requires an unprecedented effort 
in data management and sharing. Today, seed 
data are typically recorded and managed by 
different people, such as gene-bank curators, 
agronomists and breeders, often in different 
institutions and in different database systems. 

But it is doable. The Global Biodiversity 
Information Facility (GBIF) — an online 
network that facilitates open access to “infor- 
mation about the occurrence of organisms” 
— provides a good example of such an infra- 
structure and has changed how biodiversity 
is studied. But the GBIF does not currently 
handle the complex genomic data necessary 
for our efforts. 

Most importantly, results from genomics 
and agronomic research must be connected 
to the communities that are creating new 
varieties of crops. An international network 
of scientists in both the public and the 
private sectors must work together to 
provide seeds and plants to farmers and 
commercial plant breeders for further 
crossing and testing in different environ- 
ments. The research community must 
pay specific attention to the development 
of locally adapted varieties that meet the 
needs of the world’s poorest farmers. 

How much would such a systematic, 
concerted, collaborative global effort to 
feed the future cost? We estimate around 
$200 million annually. This seems like great 
value, given that as a society we have spent 
$3 billion on sequencing the human genome, 
$9 billion on constructing CERN’s Large 
Hadron Collider near Geneva in Switzer- 
land (plus about $1 billion a year in running 
costs) and can spend up to $180 million on 
a single fighter jet. 

After all, as the ecologist Charles Godfray 
put it: “If we fail on food, we fail on every- 
thing”. = 


Susan McCouch is professor of plant 
breeding and genetics and of plant biology 

at Cornell University, Ithaca, New York. 
e-mail: mccouch@cornell.edu 

On behalf of attendees and organizers of the 
Crop Wild Relative Genomics meeting held 
in Asilomar, California, in December 2012. 
See go.nature.com/nrpoe3 for full author list. 
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The Milky Way lights up the night sky above the Navajo trail in Bryce Canyon, Utah. 


ENVIRONMENTAL SCIENCE 


Hyman to fading stars 


An exploration of humanity’s compulsion to banish darkness is highly enlightening, 


finds Tim Radford. 


ere is a paradox: the brilliance of 
He Enlightenment happened by 

candlepower. Clarity of vision came 
courtesy of the dark. The enigmatic night sky 
must have gleamed everywhere as the giants 
of the post-Copernican revolution stum- 
bled home from their learned societies. In 
Birmingham in the English Midlands in the 
late eighteenth century, Matthew Boulton, 
James Watt, Erasmus Darwin, Joseph Priest- 
ley and Benjamin Franklin met by the full 
Moon, calling themselves the Lunar Soci- 
ety. The astonishing adventure of science 
has now almost eliminated true darkness. 
And for that huge and growing portion of 
humanity living in cities, it has bleached the 
night sky of all but a handful of stars. 

As Paul Bogard shows in his hymn to 
vanished darkness, The End of Night, this 
electric overdose comes at a high cost. It 
may be linked to sleep disorders, changes 
in migratory behaviour in birds and insects, 
stress and exhaustion in shift workers, and 
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even obesity. 

Bogard’s book is 
a literary journey 
— in the space of a 
few pages, we walk 
with Virginia Woolf, 
Charles Dickens and 
Rétif dela Bretonne. y 
It is also a pilgrimage 
to our capitals of light. 
We visit London, with 
its 1,600 surviving gas 
lamps; Paris, where 
110,000 4.5-watt 
bulbs illuminate just 
one courtyard in the 
Louvre Museum; and Broadway’s ‘Great 
White Way’ in New York. Bogard hunts true 
darkness, too: places far from security lights, 
where nights are so clear and dark that the 
stars begin to reveal subtle gem-like colours, 
and the Milky Way emerges as a sight of depth 
and structure. He goes beyond the broad 


The End of Night: 
Searching for 
Natural Darkness 
in an Age of 
Artificial Light 
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splashes of electric brilliance now smeared 
across continents, seeking places where dark- 
ness is conserved and sponsored by bodies 
such as the International Dark Sky Associa- 
tion in Tucson, Arizona, and the Night Sky 
Team of the US National Park Service. Bogard 
dines in Mantua, Italy, with one of the makers 
of the first world atlas of artificial night sky 
brightness (see P. Cinzano et al. Mon. Not. R. 
Astron. Soc. 328, 689-707; 2001). He sets off 
with amateur astronomers in darkest England 
and the United States. In Las Vegas, Nevada, 
where the brightest beam on Earth lights up 
the sky from the apex of the Luxor casino, 
he still — just — glimpses Rigel and Betel- 
geuse in the Orion constellation, and Sirius. 
He talks to researchers and engineers on two 
continents about the urban compulsion for 
brightness. Any evidence that ever brighter 
security lights equal ever greater security is 
dismissed as dubious: glare creates shadows 
in which predators can hide. Bogard and his 
interlocutors also conclude that we may in 
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any case be surrendering to an ancient fear 
of the dark. The European Union alone, he 
reports, is estimated to waste €1.7 billion 
(US$2.2 billion) a year on needless outdoor 
lighting. 

Bogard visits New York’s Museum of 
Modern Art to contemplate the testi- 
mony of Vincent van Gogh’s 1889 painting 
The Starry Night and, as a counterweight, 
Giacomo Balla’s 1909 Street Light (“Let's kill 
the moonlight!” was a rallying cry for Ital- 
ian Futurists). Bogard explores the biology 
of vision, the capacity of eyes to adjust to ever 
lower levels of light and the concept of ‘see- 
ing’ — the odd term stargazers use to record 
the atmospheric turbulence that makes stars 
twinkle. He tries to experience the dark- 
ness celebrated by Henry David Thoreau at 
Walden Pond in Concord, Massachusetts, 
but is stymied by the glow now emanating 
from the town. 

Bogard becomes a midnight sensualist. 
He goes into Death Valley in California 
and Nevada, and, training binoculars on 
the night sky, suddenly feels “as though ’'m 
falling. I have to pull away to find my bal- 
ance in the dark. The ground on which I’m 
standing, the cloth of stars above. The great 
nebula in Orion's belt, the Pleiades, Jupiter so 
bright and clear it makes me laugh.” He vis- 
its the Mont-Mégantic Starry Sky Reserve in 
Quebec Province, Canada, where local com- 
munities have turned darkness into astro- 
tourism. (Sadly, the sky is occluded by fog.) 

He also considers the victims of ‘white 
nights’: prisoners locked in an eternal glare, 
shift workers trapped in a cycle of sleep- 
lessness, and what you might call electric 
roadkill. In North America, some 500 spe- 
cies migrate by night and the catalogues of 
death by electrocution have been enough to 
trigger a Fatal Light Awareness Programme 
(FLAP). He cites the murderous night when 
50,000 birds followed a beam of light from 
a Georgia airport straight into the ground 
and the night when 1,500 migrating grebes 
in Utah were confused by lights reflected 
from clouds “and crashed into parking lots 
they mistook for ponds”. 

This is a rich book with a rewarding appen- 
dix of notes. The straining for descriptive 
effect occasionally obtrudes; Bogard teaches 
creative non-fiction to university students, so 
he will know Samuel Johnson’s advice about 
striking out the fine writing. The book’s ambi- 
tious scope also necessarily dictates a sacrifice 
of depth. But these are small things. The big 
thing is that, as you read it, you too will want 
to reclaim the night and perhaps rediscover 
the heavens of the Enlightenment. m 


Tim Radford is the author of The Address 
Book: Our Place in the Scheme of Things. 
He was science editor of The Guardian until 
2005. 

e-mail: radford.tim@gmail.com 
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Books in brief 


Love, Literature, and the Quantum Atom: Niels Bohr’s 1913 
Trilogy Revisited 

Finn Aaserud and John L. Heilbron OXFORD UNIVERSITY PRESS (2013) 
Science historian John Heilbron analyses the cultural underpinnings 
of physicist Niels Bohr’s creativity. Bohr’s immersion in works by 
Sgren Kierkegaard and other greats of literature and philosophy fed 
the wellsprings of his quantum atom theory, argues Heilbron (see 
Nature 498, 27-30; 2013). This is a unique contribution to the fanfare 
around the centenary of Bohr’s theory: it incorporates archivist Finn 
Aaserud’s assemblage of previously unpublished letters between 
Bohr and his family, and a reprint of Bohr’s ‘Trilogy’ of papers. 


The Universe in the Rearview Mirror: How Hidden Symmetries 
Shape Reality 

Dave Goldberg DUTTON (2013) 

Who knew symmetry could be so brilliantly entertaining? Physicist 
Dave Goldberg slings the reader straight in at the deep end of this 
big physics concept, but with enough masterly wit to keep you 
afloat. If you've ever longed to know the nitty-gritty on antimatter; 
puzzled over the exclusion principle; woken up in a cold sweat 
wondering why you are nota “sentient cloud of helium”; gritted your 
teeth over the cosmological principle; or been terrified by the beasts 
of the ‘particle zoo’, this is for you. 


A Piece of the Sun: The Quest for Fusion Energy 

Daniel Clery DUCKWORTH (2013) 

“Fusion seems too good to be true,” notes Daniel Clery. But for 
researchers in this field, making the ‘perfect’ energy source a 
reality is central to a power-hungry age. Clery chronicles the march 
of fusion projects and innovative physicists from the 1940s on. 
From Peter Thonemann’s work on the Zero Energy Thermonuclear 
Assembly to Lyman Spitzer, Lev Artsimovich and later stars, we enter 
a prodigious realm of pinch plasmas, stellarators and tokamaks. 
Despite big hopes and machines to match, harnessing “a piece of 
the Sun” still faces economic and scientific hurdles, Clery shows. 


The Attacking Ocean: The Past, Present and Future of Rising Sea 
Levels 

Brian Fagan BLOOMSBURY (2013) 

In his wide-ranging study of rising sea levels from the end of the 
last Ice Age to today, Brian Fagan traces the impact on humanity. 
The scattered groups that faced early thaws adapted by moving 
to higher ground. But the growth of populations, industrialization 
and coastal cities since 1860 has now left hundreds of millions at 
risk from the sea’s climate-driven climb. Hurricane Sandy, Fagan 
reminds, underlines the need for adaptation strategies and coastal 
defences from the United States to Bangladesh. 


The Prince of Medicine: Galen in the Roman Empire 

Susan P. Mattern OXFORD UNIVERSITY PRESS (2013) 

He was a Greek medic who patched up Roman gladiators — and 
the emperor Marcus Aurelius. The shadowy figure of Galen, whose 
treatises dominated Western medicine for centuries, here bursts 
into life. Susan Mattern shows that he used wine on wounds — 
although ignorant of its bactericidal properties — and contributed to 
anatomy (dissecting live Barbary macaques) and pharmacology. He 
was also arrogant, but Mattern argues that his clinical excellence ina 
plague-ridden era far outshone his flaws. Barbara Kiser 
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Paul Laffoley says viewers can absorb alien inspiration from his Thanaton Ill. 


Think beyond 


Joanne Baker plunges into an exhibition on visionaries 


who break all the rules. 


ant to communicate with an 
extraterrestrial? Place your palms 
in the hand-shaped outlines and 


stare into the central disembodied eye of 
Paul Laffoley’s painting Thanaton III (1989). 
The US artist and architect maintains that 
the graphics, mandalas and symbols gracing 
the lower part of the canvas were passed on 
to him by an alien called Quazgaa Klaatu. By 
touching the painting, Laffoley suggests, you 
too may absorb that information. 

Time travellers and savants are also among 
the 22 visionaries whose remarkable works 
are on show at London's Hayward Gallery. 
The Alternative Guide to the Universe exhibi- 
tion celebrates artists whose ideas lie beyond 
the mainstream, but are often directed 
towards solving real-world problems. Fol- 
lowing a spate of exhibitions that explore 
the notions of ‘fringe’ scientists, inventors 
and architects — notably at the Wellcome 
Collection in London and the Institute For 
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Figuring in Los Ange- ALTERNATIVE 

les, California (see GUIDETOTHE 

Nature 479, 40; 2011) UNIVERSE 

— thi ‘dee : HAYWARD GALLERY 
this wide-ranging j ondon, Until 26 

show reveals how August 


the power of uncon- 
strained thought might be used for healing, 
theorizing and utopia-building. 

Many of the concepts bear reflection. Laf- 
foley’s alien ‘speaks’ in scientific terms, an 
assumption also central to the work of the 
SETI (Search for Extraterrestrial Intelli- 
gence) project. And although the alternative 
quantum theories depicted would never be 
accepted by a physics journal, they are built 
around conventional physical concepts such 
as oscillations and loops. The exhibition's 
greatest value lies in giving the green light to 
out-of-the-box thinking. 

Architecture steals the show. Most ingen- 
ious is Laffoley’s proposed design for Das 
Urpflanze Haus (the primordial plant house), 
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an organic structure that can be grown from 
genetically altered seeds. Inspired by Johann 
Wolfgang von Goethe's 1790 description of 
the archetypal plant, or Urpflanz, as the basis 
for all botanic growth and form, Laffoley ima- 
gines bioengineering a ginkgo tree — one of 
the oldest known fruiting plants — to sprout 
pods that people could live in. 

He suggests that the high electrical poten- 
tial of spinach could be harnessed to power 
such a home and that bioluminescence 
could light it. This idea is not entirely fan- 
ciful: architects and synthetic biologists are 
already working together on carbonate shells 
and bioluminescent lighting for buildings 
(see Nature 467, 916-918; 2010). 

Laffoley, who The New York Times called 
“one of the most unusual creative minds of 
our time’, believes we are entering a new 
phase of modernism that will entail an archi- 
tecture that is physically alive. He calls it the 
Bauharoque, mixing the Bauhaus school of 
design's utopian ideals with the theatricality 
of Baroque art and architecture. 

This aesthetic is shared by Canadian 
architect Richard Greaves’s human nests, in 
which windows dangle and branches and 
beams canoodle. Greaves doesn’t use nails, 
but binds his cabins with rope so that the 
structures can move. His precarious shel- 
ters, made in a Canadian field from recycled 
wood and salvaged architectural materials, 
are on show in photographs and a model. 

Equally motile and dramatic are walk- 
ing, jumping, wall-climbing robot dolls by 
Wu Yulu, a Chinese farmer and self-taught 
roboticist. Their shabby, child-like appear- 
ance seems more humane than shiny plastic 
and metal cyborgs, or the robot cosmonauts 
sketched in the 1950s and 60s by French civil 
engineer Jean Perdrizet that are also on show. 

Physics and maths receive a fresh take 
here. Philip Blackmarr depicts his ‘quan- 
tum geometry’ in pen-and-ink drawings of 
vibrating sinusoidal waves so precise that 
they resemble computer printouts. Con- 
nections between the Mayan and Chinese 
number systems and Goethe's colour the- 
ory are explored in rainbow chequerboard 
paintings by American artist Alfred Jensen. 
George Widener, a professed ‘time traveller’ 
and autistic, can tell immediately what day 
of the week any future date will fall on, and 
turns dates into intricate sketches of magic 
number squares and cities. 

Any scientist visiting this alternative uni- 
verse will find themselves, as I did, poring 
over blueprints to try to figure out how the 
machines depicted work, or discerning the 
mathematical patterns behind the painted 
squares. Given how much we still don’t 
know, this show importantly asks: are you 
sure your universe is the right one? m 


Joanne Baker is senior comment editor at 
Nature. 


PAUL LAFFOLEY/PHOTO BY LINDA NYLIND 


Correspondence 


US patent rulings 
will fuel invention 


On 13 June, the US Supreme 
Court denied the validity of 
patenting genes (Nature 498, 
281-282; 2013) — but this is 
only part of the story. Since 
2010, the court has made three 
separate landmark rulings that 
give inventors full access to 
the wellspring of ideas, laws of 
nature and natural products. 

Patent law requires ingenuity 
and invention for patenting 
a discovery. The Supreme 
Court established in 1980 that 
genetically modifying cells to eat 
oil, resist pesticides or produce 
insulin, for example, was a 
patentable invention. 

After the draft human genome 
was released in 2001, the US 
Patent and Trademark Office 
stipulated that only genes 
of known function could be 
patented. Into this category fell 
BRCA1 and BRCA2, the genes 
mutated in some breast and 
ovarian cancers, which were 
patented by the Utah firm Myriad 
Genetics. But questions arose — 
hadn't the firm simply extracted a 
natural product? Did it ‘own the 
genetic information within? 

The court subsequently ruled 
that a patent that pre-empts all 
uses of a natural product was 
disallowed (I was a plaintiff in the 
case). In separate cases in 2010 
and 2012, it also ruled against 
patents that pre-empt all uses of an 
abstract idea or ofa natural law. 

I disagree that these rulings 
could stifle US innovation: they 
set a higher bar for genuine 
invention so that people will 
gain from better medicines and 
devices. And they will retain 
ownership of their genomes. 
Harry Ostrer Albert Einstein 
College of Medicine, New York, USA. 
harry.ostrer@einstein.yu.edu 


Will China expand on 
its carbon trading? 


China's current pilot schemes for 
carbon-emissions trading are 
the forerunners to a nationwide 


carbon market slated for 2016 
(Nature 498, 145-146; 2013). 
This has prompted international 
speculation that China might 
adopt an absolute cap on national 
emissions by 2020. We contend 
that future Chinese climate policy 
is unlikely to rely mainly on cap 
and trade, and so will not depend 
on the success of pilot schemes. 

In our view, the schemes are 
not likely to deliver a carbon 
price that reflects its social cost 
or provides an incentive for long- 
term investment in low-carbon 
technologies. The government 
may bring in other instruments 
in parallel (such as carbon 
taxes or mandatory emissions 
standards), which would distort 
the carbon price in China as they 
have in Europe. 

The Chinese government 
should not allow the carbon 
prices emerging from its pilot 
trading schemes to distract 
attention from the real costs of 
moving to a low-carbon economy. 
Xi Liang, Francisco Ascui 
University of Edinburgh, UK. 
xi.liang@ed.ac.uk 
David Reiner University of 
Cambridge, UK. 


Protection for trade 
of precious rosewood 


Madagascar’s rosewood trees 
(Dalbergia spp.), prized for 
their hard, burgundy-coloured 
wood, are under threat after 
being exploited to make high- 
quality furniture and musical 
instruments. 

Earlier this year, rosewoods 
won greater trade protection at 
the Convention on International 
Trade in Endangered Species of 
Wild Fauna and Flora (CITES) 
conference in Bangkok. The 
challenge now, as for CITES 
designations globally, is to 
implement and enforce this 
protection. 

Despite previous logging and 
shipping bans on hardwoods 
from Madagascar, and even a 
voluntary CITES Appendix HI 
listing of five Dalbergia species in 
2011, illegal logging has persisted 


in the wake of the country’s 
political turmoil in 2009. The 
current Appendix II listing will 
create legal obstacles to illegal 
trade through a permit system 
that allows only non-detrimental 
harvesting practices. 

Meredith A. Barrett University 
of California, San Francisco, USA. 
barrettm@chc.ucsf.edu 

Jason L. Brown, Anne D. Yoder 
Duke University, Durham, 

North Carolina, USA. 


Identical twins don’t 
share fingerprints 


As chair of the Forensic 
Identification Standards 
Committee of the International 
Association for Identification, 

I would like to point out an 
error in your obituary of 
Joseph Murray regarding the 
fingerprinting of identical twins 
(Nature 493, 164; 2013). 

Murray did ask for Richard 
and Ronald Herrick to be 
fingerprinted to determine 
whether they were identical 
before he transplanted a 
kidney from one to the other 
(J. E. Murray Surgery of the Soul; 
Watson, 2001). Yet the Boston 
police archives have no record of 
the fingerprint request or of its 
results (I. Truta and M. Sullivan, 
personal communication). 

The twins fingerprint 
classification codes were probably 
tested for similarity, although 
this would not indicate twinning 
because unrelated people often 
share the same classification code. 
I could find no evidence that “the 
twins fingerprints were identical’, 
as the obituary states. Had they 
been, I am confident that forensic 
science would have taken notice 
in 1954, 

Different people, including 
genetically indistinguishable 
twins, do not deposit identical 
fingerprints (see, for example, 

X. Tao et al. PLoS ONE 7, e35704; 
2012). 

John R. Vanderkolk Indiana 
State Police Laboratory, 

Fort Wayne, Indiana, USA. 
vanderkolkjohn@yahoo.com 


Latin America should 
ditch impact factors 


Increased reliance on impact 
factors to evaluate scientific 
merit is having negative social 
and environmental effects in 
Latin America. We should 
abandon these indicators and 
concentrate on strengthening 
regional and national journals 
and networks for socially and 
locally relevant research. 

Impact-factor rankings 
have damaged the region for 
several reasons. Because impact 
factors are generally low for 
conservation and ecology articles 
(compared with those in, say, 
biotechnology or medicine), 
these disciplines attract 
proportionately less funding. 
Top-tier journals tend to focus 
on global environmental issues 
to boost citation rates, at the 
expense of regionally important 
ones. And theoretical-ecology 
journals have higher impact 
factors than applied-ecology 
journals. 

Together, these metrics are 
diverting researchers away from 
regional problems even as socio- 
ecosystems deteriorate around 
them. The South American 
biogeographic region comprises 
10% of Earth's surface and hosts 
50% of its biodiversity, yet the 
continent contributed less than 
4% of global scientific output 
in 2010 (see go.nature.com/ 
hudjwn; in Spanish). 

We suggest that Latin 
America should aim to achieve 
a genuine knowledge dialogue 
(see go.nature.com/ifrnlx; in 
Spanish) through confronting 
regional challenges, rather than 
focus on increasing its global 
“brain circulation” (Nature 490, 
325; 2012). 

Adrian Monjeau Fundacion 
Bariloche; and CONICET, 
Argentina. 

amonjeau@gmail.com 

Jaime R. Rau University of 

Los Lagos, Osorno, Chile. 
Christopher B. Anderson 
CADIC-CONICET; and National 
University of Tierra del Fuego, 
Ushuaia, Argentina. 
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OBITUARY 


Heinrich Rohrer 


(1933-2013) 


Co-inventor of the scanning tunnelling microscope. 


einrich Rohrer, Heini to those who 

knew him, helped to open the door 

to nanotechnology. With Gerd 
Binnig, he produced a device that allowed 
researchers to image and measure atoms 
and molecules, and to manipulate them. 

Rohrer, who died on 16 May, 
three weeks before his 80th 
birthday, was born in 1933, half 
an hour after his twin sister. He 
grew up in the village of Buchs in 
eastern Switzerland. Rohrer stud- 
ied physics at the Swiss Federal 
Institute of Technology (ETH) 
in Zurich, where he remained to 
pursue a PhD. It was during his 
PhD years that he first came into 
contact with the nanometre scale, 
through studying the properties of 
superconductors. 

After receiving his PhD in 
1960, Rohrer pursued a two-year 
postdoctoral research fellowship 
at Rutgers University in New Jer- 
sey, working on superconductors 
and metals. At the end of 1963, he 
joined the IBM Research Labora- 
tory in Riischlikon, Switzerland, 
on the recommendation of various 
peers including physicist Bruno Liithi, who 
had worked alongside Rohrer at the ETH. 

Towards the end of the 1960s, Rohrer 
began working on an antiferromagnet called 
gadolinium aluminate (GdAIO,) in collabo- 
ration with Keith Blazey, another physicist 
at the IBM lab. Antiferromagnetism is a 
type of magnetic ordering that vanishes at a 
certain temperature. The work brought 
Rohrer into the field of critical phenomena 
and led to crucial findings about magnetic 
phase transitions. By this point, the group 
at the IBM lab had established a world- 
renowned reputation in critical phenomena, 
after K. Alex Miiller — then head of physical 
science — had pioneered the field of struc- 
tural phase transitions. 

In the late 1970s, Rohrer’s interest shifted 
towards the complex structure of surface 
materials. In building ever-faster com- 
puters, the semiconductor industry was 
rapidly approaching the design of chips on 
the nanoscale. Yet few tools were available 
to study the structure and properties of 
materials at this scale. In 1978, Rohrer 
insisted that the IBM lab hire Gerd Binnig, 
a young German physicist from Frank- 
furt University, and the two started to 


30 | NATURE | VOL 499 | 4 JULY 2013 


contemplate a new device. By 1981, the 
pair had designed the world’s first scanning 
tunnelling microscope (STM). 

Unlike conventional microscopes, the 
STM did not use lenses. Instead, a probe 
sharpened to a single atom at the tip was 


moved close enough to the surface of a 
conductive material, such as silicon or gold, 
for the electron wavefunctions of the atoms 
in the tip to overlap with those of the atoms 
in the conductive surface. (Picture two over- 
lapping electron ‘clouds:) When a voltage 
was applied to the tip, electrons started to 
‘tunnel, or ‘leak, through the vacuum gap, 
causing a current to flow from the foremost 
atom of the tip into the surface. 

Moving the tip by the diameter of a 
single atom changed the current by a factor 
of a thousand, giving the device its enormous 
resolution. As the tip was scanned back and 
forth, it followed the atomic structure of the 
surface, extending and retracting over dips 
and bumps. Thus, for the first time, it was 
possible to get up close and personal with 
atoms in three dimensions. 

Nobody believed that Rohrer and Binnig's 
experiments demonstrating quantum tun- 
nelling could ever work. A tremendous 
challenge was bringing the tip only 0.2 nano- 
metres away from the surface (1 nanometre 
is 1 billionth of a metre). However, a clev- 
erly designed mechanism using the forces of 
strong magnets did the trick. 

Rohrer and Binnig’s initial results were 
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eventually verified by other groups and 
presented at a workshop on the STM in the 
Austrian Alps in 1985. Devices such as the 
atomic force microscope (AFM) — avery 
high resolution type of scanning microscope 
that measures the atomic forces between the 
tip ofa probe and the surface being 
scanned — have their roots in this 
meeting. During the last night of 
the workshop, the mountains were 
abuzz with crazy ideas about how 
such microscopes might be used 
in applications in all sorts of fields, 
from fundamental physics and 
chemistry to information tech- 
nology, quantum computing and 
molecular electronics, as well as in 
the life sciences. 

In 1986, Rohrer and Binnig 
shared half of the Nobel Prize in 
Physics. The other half of the prize 
was given to the German physicist 
Ernst Ruska for inventing the scan- 
ning electron microscope, a device 
that produces images of a sample 
by scanning it with a focused beam 
of electrons. 

With the emergence of scanning 
probe microscopes (the STM and 
the AFM are just two among many types of 
these), the door to the nanoworld was pushed 
wide open. Today, such tools are still making a 
tremendous impact on numerous disciplines. 

An extraordinarily charismatic man, 
Heini went on to promulgate nanoscience 
and nanotechnology to upcoming genera- 
tions of researchers. I remember a lecture he 
gave in South Korea some years ago, which 
was attended by almost 4,000 high-school 
and university students. His captivating 
description of the development of the STM 
was followed by thunderous applause. In 
fact, one of the attendees recently told me 
that it was Heini’s talk that inspired him to 
study physics and nanoscience. 

Heini will be deeply missed as a natural 
leader, a visionary, a stimulating scientist and 
a wonderful person. He is survived by his 
wife Rose-Marie Egger, his two daughters 
Doris Rohrer Hansen and Ellen Rohrer, and 
two grandchildren. = 


Christoph Gerber collaborated with 
Rohrer at the IBM Research Laboratory for 
many years. He is at the Swiss Nanoscience 
Institute, University of Basel, Switzerland. 
e-mail: christoph.gerber@unibas.ch 
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The cost of children 


An investigation into the causes of the decline in the number of children being born finds that economic motivations are 
more influential than child mortality or social learning, but also reinforces the relatedness of these factors. 


RUTH MACE 


ow that my own children are teenagers 
N= I work in a university in a large 

city, I can go for weeks without prop- 
erly interacting with a child. This miserable 
and unnatural state of being is in part because 
the society I live in is WEIRD (Western, edu- 
cated, industrialized, rich and democratic)’. 
But the dramatic and near-universal decline 
in birth rate and family size that has been one 
of the most pervasive social changes of the past 
two centuries, and that continues apace around 
the world, means that my situation is far from 
unusual. The question of why fewer babies are 
being born gets to the heart of what matters to 
us in life: do we value money, or prestige, or is it 
reproductive success? Writing in Proceedings of 
the National Academy of Sciences, Shenk et al.’ 
attempt to disentangle the relative importance 
of three main classes of influence — risk and 
mortality, economic and investment, and 
cultural transmission — on fertility*. 

Demographers have traditionally placed 
great emphasis on the reduction in mortality 
as the main cause of the reproductive decline. 
There is no doubt that this is one driver of 
the transition to low birth rates, but its fail- 
ure to predict all aspects of the phenomenon 
led some researchers to propose the cultural 
influence of new ideas as a major determinant. 
Still others have emphasized the effects of the 
changing costs and benefits of children in the 
modern world. Shenk and colleagues tried 
to rank these factors by using a detailed data 
set from a sample of women in a population 
that is currently changing from large to small 
families. This is an elegant and comprehensive 
study that was much needed. 

The setting for the research is the Matlab 
region of rural Bangladesh, an area that has 
been the subject of demographic surveillance 
for many years by the International Centre for 
Diarrhoeal Disease Research, Bangladesh. The 
authors interviewed a representative sample 
of about 1,000 women from this wider study 
to investigate the relevant socio-economic 
variables that might be influencing their fer- 
tility decisions (Fig. 1). The study is notable 
for its statistical approach, which attempts to 


*This News & Views article was published online 
on 12 June 2013. 
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Figure 1 | Fertility in decline. A woman and child from the Matlab region of Bangladesh where Shenk 


= 


7 


et al.” conducted research aimed at understanding why women are, on average, having fewer children 


than in previous decades. 


compare the three groups of variables accord- 
ing to their relative effectiveness at predicting 
the data. To explain to those of us brought up 
on statistical tests of null hypotheses and P val- 
ues (which my postdocs tell me are now passé), 
this approach represents a formal method for 
evaluating the relative success of defined sets of 
variables (models) at predicting the observed 
data, using likelihood theory and a measure 
known as the Akaike information criterion. 
The authors found that the economic-and- 
investment model predicted fertility rates 
much better than either of the other two mod- 
els. Their tests show that mortality variables 
do have a strong influence on the number of 
births (we have long known that dead babies 
are quickly ‘replaced’ with more births), but 
they seem less relevant to the more interesting 
question of what determines the final family 
size of surviving children. Correlates of cul- 
tural norms do not seem to have much pre- 
dictive power. Owning farmland emerges as 
key, along with several other factors associated 
with the costs of spreading parental investment 
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among many offspring. The authors’ results 
also indicate that education is very impor- 
tant, and is one of the things that comprise 
that parental investment. I have long been an 
advocate of heritable, transferable resources 
having a crucial role in fertility decline’, so am 
not surprised by this conclusion, but, as the 
authors themselves observe, the story may not 
be as clear cut as the statistics suggest. 

It is interesting that this particular popula- 
tion is often held up as an example of one in 
which fertility declined without the usual cor- 
relates of economic development. For exam- 
ple, only a trivial fraction of the Bangladeshi 
women were in jobs requiring education, and 
indeed, most married women rarely left their 
compound. This suggests that the influence 
of education is more cultural than economic. 
However, the statistical approach chosen influ- 
ences the conclusions drawn — the choice of 
which variables apply to each hypothesis is a 
key factor, and the data-sampling strategy is 
another. In this case, Shenk et al. listed educa- 
tion in both the economic-and-investment and 


DAWN NEILL 


the cultural-transmission categories, so the 
comparison between these two models effec- 
tively relies only on the other chosen variables 
(which may be less important). 

It has also been argued that education itself 
enhances the cultural transmission of low- 
fertility norms*”, but such interactions were 
not tested in this study. A multilevel model 
that incorporates locally clustered rather than 
randomly dispersed data would be required 
to identify peer-to-peer, village-level or other 
contextual effects®. Studies at this particular 
site’ and elsewhere® have shown how religion 
influences the uptake of contraception, sug- 
gesting a significant role for cultural norms. 
But Shenk and colleagues did not find the 
influence of religion to be substantial enough 
— relative to education and other factors — to 
be included in the cultural-transmission model 
that was used in the overall model comparison. 


PLANETARY SCIENCE 


One cannot rule out effects that have not been 
tested for. 

Interacting factors make such studies chal- 
lenging, even with sophisticated statistics. It is 
clear that cultural transmission is one of the 
ways by which humans learn that the costs 
and benefits associated with certain processes 
— such as raising children — have changed, 
or might change in the future. Thus, as Shenk 
et al. say, both economics and investment and 
cultural transmission are important and have 
complementary effects. The reproductive deci- 
sions of those of us with small families do not 
seem to maximize our genetic fitness, despite 
the numerous financial, health-related, educa- 
tional and other individual benefits associated 
with low fertility’. This is one reason why the 
topic fascinates evolutionary anthropologists, 
and explains why they have been among the 
most enthusiastic to pick up the baton handed 


The robustness 
of planet formation 


The detection of two planets in an open star cluster demonstrates that planetary 
systems are able to survive disruptive events, such as supernova explosions, 
during the dense, early stages of the life of such clusters. SEE LETTER P.55 


WILLIAM F. WELSH 


naclear night, a person admiring the 
heavens may see a few thousand stars. 
The stars are more or less randomly 
distributed, although some are gathered in 
groups known as open clusters, the most 
famous being the Pleiades (Fig. 1). Stars, and 
presumably their planets, are born in such 
clusters. But despite considerable effort'”, 
evidence for planets in clusters is frustrat- 
ingly scarce: of more than 850 planets that 
are currently known’, only four reside in clus- 
ters*°. And whereas more than 10,000 stars in 
open clusters have been examined, before the 
study by Meibom et al.’ reported on page 55 
of this issue, no transiting planets had been 
detected*. Transits, which are mini-eclipses 
that occur when a planet passes in front of its 
star, are especially valuable because they allow 
a planet's size to be estimated. Meibom and col- 
leagues detected two sub-Neptune-sized trans- 
iting planets in the open cluster NGC 6811, 
out of a sample of only 377 stars. This remark- 
able success rate was made possible by the 
ultra-precise data provided by NASA’s Kepler 
Mission’. 
Dozens of open clusters are easily visible 


*This article and the paper under discussion’ were 
published online on 26 June 2013. 


with a small telescope, enrapturing stargazers 
with the sparkling of tens to thousands of stars. 
The stars in a cluster are all kin, born from the 
same parent molecular cloud — a huge mass 
of cold gas and microscopic grains teetering 
on the brink of instability. Ifa cloud is nudged, 
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on from demographers, and why they continue 
to run with it. = 


Ruth Mace is in the Department of 
Anthropology, University College London, 
London WCIE 6BT, UK. 

e-mail: r.mace@ucl.ac.uk 
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it can be set down a path of cascading gravita- 
tional collapse, fragmenting into thousands of 
dense clumps. These condensations become 
smaller and smaller, heating up as they com- 
press, until, finally, thermonuclear-fusion igni- 
tion occurs and a star is born. Leftover matter 
that does not form the star will continue to 
orbit it and may coalesce into planets. 

A cluster can be far from a placid environ- 
ment for planet formation. As stars pass each 
other, their gravity can tug on planets and 
planet-forming disks, and stellar winds and 
intense ultraviolet light from hot young stars 
can dissipate the star- and planet-forming 
cloud. The larger and denser the cluster, the 
more important these effects are. Although 
NGC 6811 is a small cluster, it is not a young 
one’, and this is significant. As a cluster ages, 
it ‘dissolves’ away, its stars mixing with the 
myriad stars of the Milky Way; sibling stars 


+ 


Figure 1 | A good age. The Pleiades star cluster is roughly 100 million years old. By contrast, the open 
cluster NGC 6811, within which Meibom and colleagues have detected’ two transiting planets, has 


survived to an age of 1 billion years. 
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become spread across the Galaxy. The time 
that it takes for the cluster to disperse, between 
around 10 million and 100 million years, is 
generally short compared with the lifetime of 
most stars. But about 10% of clusters persist 
well beyond that age, their gravity being strong 
enough to slow the dispersal of the stars. 

To have its current number of stars, 
1 billion years after its formation, NGC 6811 
must have contained a much greater number of 
stars in the past. Conditions would have been a 
lot more hostile then than they are today, with 
numerous stellar encounters and significant 
evaporation of the natal cloud by hot stars. 
The planets and planet-forming disks may 
have even endured several nearby supernova 
explosions. The discovery of planetary systems 
that have withstood this challenge tells us that 
planet formation is robust — nature likes to 
create planetary systems, and many survive the 
birthing process. 

The planets in NGC 6811 are respectively 
only 2.8 and 2.9 times Earth’s radius. Most of 
the planets found by Kepler are around this 
size’’. The planets’ orbital periods are 18 and 
16 days, also common for Kepler-discovered 
bodies. So these two planets seem quite typi- 
cal. However, Meibom and colleagues could 
not measure the planets’ masses because the 
host stars were too faint, and so the authors 
relied on statistical arguments to validate the 
planets. Using the well-established BLENDER 
procedure described in the paper’s Supplemen- 
tary Information, Meibom et al. estimated that 
the chance ofa planetary transit signal being a 
false positive was less than 0.24%. It is probably 
much less than this, as the authors were quite 
cautious in their estimates, and rightly so, as 
the validation depends in part on estimating 
the occurrence rate of planets in clusters, and 
this is not independently known. 

An obvious limitation of the study is that it 
is based on only two planets in just one clus- 
ter. However, the characteristics of the stars in 
NGC 6811 are similar to those of non-cluster 
(field) stars, and the occurrence rate of plan- 
ets in the cluster and in the field were both 
obtained from Kepler measurements. Thus, 
the comparison between the two rates is 
straightforward, without the usual contortions 
needed to compare surveys that have different 
sensitivities and biases. 

Most stars are thought to have formed in 
clusters smaller than the primeval NGC 6811, 
and thus probably in calmer environments. 
Yet Meibom and colleagues demonstrate that 
the planetary properties and occurrence rate 
in NGC 6811 are very similar to those of field 
stars. This implies that dense open-cluster 
environments do not significantly destroy 
planetary systems. Kepler has observed three 
other clusters, two of which, NGC 6819 and 
NGC 6791, are considerably older than 
NGC 6811, and presumably had even more 
dense and dynamically active nascent envi- 
ronments. NGC 6791 is especially interesting: 
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a cluster this old, 7 billion to 9 billion years, 
is rare, particularly given its enrichment of 
elements heavier than helium. It will be truly 
fascinating to learn the planet occurrence rate 
in this cluster. = 
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Towards a million- 
year-old genome 


The sequencing of a complete horse genome from a bone dating to around 
700,000 years ago sheds light on equine evolution and dramatically extends the 
known limit of DNA survival. SEE LETTER P.74 


CRAIG D. MILLAR & DAVID M. LAMBERT 


he field of ancient DNA continues 
to break records. Ancient DNA and 
genomes provide a window on the 
recent evolutionary past and often reveal that 
history is more complex than we previously 
thought. Following on from the work of the 
evolutionary biologist Allan Wilson’ in the 
early 1980s, ancient DNA studies are now 
used to address three broad issues: the estima- 
tion of molecular rates of change using seri- 
ally preserved samples; the testing of specific 
evolutionary hypotheses; and the estimation 
of changes in genetic diversity and popula- 
tion sizes through time. In this issue, Orlando 
et al.” (page 74) address the latter two con- 
cepts in their report of the complete genome 
sequence ofa horse that lived around 700,000 
years ago. This genome is almost 10 times 
older than the previous record, which was for a 
Denisovan’, an archaic human dated at 
approximately 80,000 years before present*. 
The Middle Pleistocene horse genome was 
obtained using a bone fragment recovered from 
Arctic permafrost at Thistle Creek, Canada. 
For comparison, Orlando and colleagues also 
sequenced the genome of a Late Pleistocene 
horse (from around 43,000 years before pre- 
sent) and a Przewalski’s horse (Equus ferus 
przewalskii). The latter is considered the only 
remaining truly wild member of the Equus 
genus, and the new data show that it is the 
closest living relative of the domesticated 
horse. In addition, the authors sequenced the 
genomes of five domestic horse breeds (Equus 
ferus caballus) and a donkey (Equus asinus). 
They then used these data to estimate several 


*This article and the paper under discussion? were 
published online on 26 June 2013. 
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evolutionary and population parameters of the 
horse, which has been a textbook example in 
evolutionary biology and palaeontology since 
early work’ in the 1950s. For example, they 
calculate the time to the most recent common 
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Figure 1 | Horse origins. Orlando and colleagues’ 
phylogenetic reconstruction’ was based on the 
genomes of the present-day donkey, domestic 
horses and the Przewalski’s horse, and those derived 
from horse bones dating to approximately 43,000 
(Late Pleistocene) and 700,000 (Middle Pleistocene) 
years ago. This analysis allowed the authors to 
estimate the time to the most recent common 
ancestor of all members of the Equus genus to be 
between about 4.0 million and 4.5 million years ago. 
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Figure 2 | Survival of the coldest. The rate of 

DNA decay varies with environmental conditions, 

as indicated by this plot of the estimated half-lives 

of 30- and 100-base-pair (bp) DNA fragments 

as a function of temperature’. The estimated ages 

and temperatures of material used to recover 

the genomes of a Neanderthal (N)"°, a woolly 

mammoth (M)"' and the horse fossil discovered at 

Thistle Creek, Canada (H)’, are shown. 


ancestor of members of the Equus genus to 
be between 4.0 million and 4.5 million years 
ago (Fig. 1), approximately twice the previous 
estimate. 

Orlando et al. went on to show that the size 
of the horse population has fluctuated many 
times over the past 2 million years, particularly 
during periods of severe climatic change. Inter- 
estingly, they reveal that Przewalski’s horse has 
retained substantial genetic diversity, a fea- 
ture that could be significant for the species’ 
future conservation. Furthermore, they iden- 
tify genomic regions in domesticated horses 
that have been under positive selection; some 
of these might represent genetic signatures of 
domestication. 

But the implications of this work go well 
beyond the evolution of horses, by also 
providing evidence for the limits of DNA sur- 
vival. Until this study, many experts*® would 
have thought that it was impossible to recover 
a genome from a sample of this age because of 
the rapid degradation of DNA into ever shorter 
fragments that occurs following the death of 
an organism. The decay is driven initially by 
the body’s own enzymes, and the actions of 
enzymes from microorganisms soon follow 
— death shuts down the normal defences that 
protect an organism against such fates. This 
process is, of course, affected by environmen- 
tal conditions, including the presence of oxy- 
gen and water, the microorganisms present, 
pH and temperature. In general, the colder 
the environment, the slower the rate of DNA 
degradation (Fig. 2). 

Orlando and colleagues’ success is undoubt- 
edly due to the extreme cold in which the 
bone resided and its resulting preservation, 
combined with advances in second-genera- 
tion gene sequencing, including true single- 
molecule sequencing technology. Technical 
developments in DNA recovery and the con- 
struction of DNA-sequencing libraries also 


contributed to the authors’ achievements. 
From this same sample, they were able to 
sequence 73 proteins, including some found 
in blood. This illustrates that other meth- 
ods apart from DNA sequencing can now 
be applied, on a large scale, to studies of the 
deep past. 

So just how long can animal DNA survive? 
Recent work has modelled the absolute lim- 
its of DNA survival, and suggested that DNA 
more than 1 million years old may be recover- 
able from very cold environments’. Interest- 
ingly, the age of the horse genome recovered 
by Orlando et al. falls comfortably within these 
predicted limits of DNA survival (Fig. 2), sug- 
gesting the tantalizing proposition that com- 
plete genomes several millions of years old may 
be recoverable, given the right environmental 
conditions. Indeed, Orlando and colleagues’ 
study encourages us to wonder if it might be 
possible to recover DNA from a wide range 
of Middle Pleistocene samples. Of particu- 
lar interest would be material from ancestral 
human species® such as Homo heidelbergensis 
and Homo erectus’. Such genomic informa- 
tion, in combination with the Denisovan* and 
Neanderthal”® genomes, would undoubtedly 


NEWS & VIEWS | RESEARCH | 


shed light on the evolution of humans and our 
hominin ancestors, in much the same way as 
Orlando and colleagues’ study provides insight 
into the evolution of horses and the survival of 
DNA itself. m 
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Mutations close in 
on gene regulation 


A genome-wide analysis of DNA and RNA sequences, gene expression and DNA 
modifications in 200 samples of acute myeloid leukaemia sets the stage for data 
integration and verification that will enhance our understanding of this cancer. 


STEIN AERTS & JAN COOLS 


cute myeloid leukaemia exhibits vari- 
A* genetics, presentation and clinical 
outcome. Writing in the New England 
Journal of Medicine, Ley and colleagues’ from 
the Cancer Genome Atlas Research Network 
present the first comprehensive genome- 
wide analysis of DNA sequences, transcribed 
messenger RNA and microRNA molecules, 
and DNA modification by methylation, in 
200 cases of adult acute myeloid leukaemia 
(AML). The data, which are publicly available, 
provide unprecedented insight into the molec- 
ular genetics of this cancer and its influence on 
treatment responses. Although the challenge 
of integrating and functionally verifying these 
data remains, the findings are expected to help 
to explain the biology of AML, and could lead 
to the development of therapeutic strategies. 
Historically, the identification and charac- 
terization of individual genetic modifications, 
such as chromosomal translocations, gene 
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fusions and gene mutations, have fuelled our 
understanding of the onset and progression 
of AML. More recently, whole-genome and 
whole-exome sequencing studies have fur- 
ther refined this view, identifying mutations 
in genes in which they were not expected, such 
as DNMT3A, IDH1, PHF6 and SMC3. (The 
exome is the portion of the genome compris- 
ing exon sequences — those that form mature 
mRNA molecules.) Now that our knowledge 
of DNA-sequence mutations in AML has 
advanced, it is time for greater integration of 
this information with data on gene expression. 

Deregulation of gene expression is central 
to cancer development. For example, many 
cancer-related mutations result in reduced 
expression of genes that are associated with 
apoptotic cell death or cell senescence, or alter 
the expression of genes involved in cell pro- 
liferation and differentiation. These changes 
are often caused by perturbed activity of 
proteins involved in transcriptional control. 
Understanding the role of gene expression in 
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ancestor of members of the Equus genus to 
be between 4.0 million and 4.5 million years 
ago (Fig. 1), approximately twice the previous 
estimate. 
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of the horse population has fluctuated many 
times over the past 2 million years, particularly 
during periods of severe climatic change. Inter- 
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retained substantial genetic diversity, a fea- 
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future conservation. Furthermore, they iden- 
tify genomic regions in domesticated horses 
that have been under positive selection; some 
of these might represent genetic signatures of 
domestication. 
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enzymes from microorganisms soon follow 
— death shuts down the normal defences that 
protect an organism against such fates. This 
process is, of course, affected by environmen- 
tal conditions, including the presence of oxy- 
gen and water, the microorganisms present, 
pH and temperature. In general, the colder 
the environment, the slower the rate of DNA 
degradation (Fig. 2). 

Orlando and colleagues’ success is undoubt- 
edly due to the extreme cold in which the 
bone resided and its resulting preservation, 
combined with advances in second-genera- 
tion gene sequencing, including true single- 
molecule sequencing technology. Technical 
developments in DNA recovery and the con- 
struction of DNA-sequencing libraries also 


contributed to the authors’ achievements. 
From this same sample, they were able to 
sequence 73 proteins, including some found 
in blood. This illustrates that other meth- 
ods apart from DNA sequencing can now 
be applied, on a large scale, to studies of the 
deep past. 

So just how long can animal DNA survive? 
Recent work has modelled the absolute lim- 
its of DNA survival, and suggested that DNA 
more than 1 million years old may be recover- 
able from very cold environments’. Interest- 
ingly, the age of the horse genome recovered 
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predicted limits of DNA survival (Fig. 2), sug- 
gesting the tantalizing proposition that com- 
plete genomes several millions of years old may 
be recoverable, given the right environmental 
conditions. Indeed, Orlando and colleagues’ 
study encourages us to wonder if it might be 
possible to recover DNA from a wide range 
of Middle Pleistocene samples. Of particu- 
lar interest would be material from ancestral 
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and Homo erectus’. Such genomic informa- 
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shed light on the evolution of humans and our 
hominin ancestors, in much the same way as 
Orlando and colleagues’ study provides insight 
into the evolution of horses and the survival of 
DNA itself. m 
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A* genetics, presentation and clinical 
outcome. Writing in the New England 
Journal of Medicine, Ley and colleagues’ from 
the Cancer Genome Atlas Research Network 
present the first comprehensive genome- 
wide analysis of DNA sequences, transcribed 
messenger RNA and microRNA molecules, 
and DNA modification by methylation, in 
200 cases of adult acute myeloid leukaemia 
(AML). The data, which are publicly available, 
provide unprecedented insight into the molec- 
ular genetics of this cancer and its influence on 
treatment responses. Although the challenge 
of integrating and functionally verifying these 
data remains, the findings are expected to help 
to explain the biology of AML, and could lead 
to the development of therapeutic strategies. 
Historically, the identification and charac- 
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such as chromosomal translocations, gene 
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fusions and gene mutations, have fuelled our 
understanding of the onset and progression 
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ther refined this view, identifying mutations 
in genes in which they were not expected, such 
as DNMT3A, IDH1, PHF6 and SMC3. (The 
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ing exon sequences — those that form mature 
mRNA molecules.) Now that our knowledge 
of DNA-sequence mutations in AML has 
advanced, it is time for greater integration of 
this information with data on gene expression. 
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to cancer development. For example, many 
cancer-related mutations result in reduced 
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the expression of genes involved in cell pro- 
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are often caused by perturbed activity of 
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Figure 1 | Frequent mutations in acute myeloid leukaemia. The Cancer Genome Atlas Research 
Network’ presents an analysis of mutations that are repeatedly seen in patients with acute myeloid 
leukaemia. All of these mutations can be linked to the regulation of gene expression. They include: 
mutations in transcription factors and signalling proteins; mutations in factors that regulate the 
methylation of DNA and associated histone proteins; mutations in the protein complex cohesin, which 
regulates chromatin structure; and mutations in proteins involved in splicing, a process that regulates the 
amount and type of mRNA molecules formed. The protein products of mutated genes are shown in red. 


cancer will require analysis of epigenetic modi- 
fications (structural and chemical genomic 
changes, such as DNA methylation, that do 
not change the DNA sequence) and structural 
changes in chromatin (the complex of DNA 
and associated histone proteins). 

Remarkably, all of the mutations that Ley 
and colleagues identify in their AML survey, 
and posit to be cancer-driving, can be asso- 
ciated with the regulation of gene expression 
(Fig. 1). For example, the authors found several 
frequent mutations in transcription factors, 
including TP53, WT 1, RUNX1, CEBPA and 
the PML-RARA fusion protein. The effects of 
such mutations are directly attributable to the 
altered regulation of direct and indirect target 
genes, as shown extensively’ for regulation by 
MYC — atranscription factor often mutated 
in cancer. 

Another category of typical cancer drivers 
is signalling proteins. Altered activity of 
signalling pathways will affect the activity 
of downstream transcription factors such as 
STAT proteins, MYC, ETS, NF-«B and AP-1, 
and so signalling-protein mutations will also 
affect gene regulation. According to Ley and 
colleagues’ study, the most frequent of such 
mutations in AML are in the proteins FLT3, 
KIT, NRAS and KRAS. 

A further category of cancer-driving muta- 
tions is those that target chromatin-modifying 
proteins and DNA methylation factors, which 
can have broad effects on transcriptional con- 
trol by promoting or inhibiting the accessibility 
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of the DNA to transcription factors or other 
proteins. The most recurrent of these altera- 
tions found by the authors were in the genes 
that encode the proteins MLL, TET2, IDH1, 
IDH2 and DNMT3A. In addition to these 
three main categories, the authors identified 
mutations that affect cohesin, a protein com- 
plex that influences structural chromatin inter- 
actions, and the spliceosome, a complex that 
regulates the amount and type of mature RNA 
transcripts formed. 

Several lines of investigation remain to be 
addressed. How does the interplay of these 
mutations lead to cancer gene-expression 
profiles? How do these expression profiles 
ultimately lead to a proliferation advantage in 
cancer cells? And how do certain expression 
profiles represent different cancer character- 
istics or cancer subtypes that have different 
clinical properties? Ley et al. did not aim to 
address the enormous challenge of answering 
these questions, but they have paved the way 
to do so by providing invaluable data sets for 
further analysis and integration. 

The integration of epigenomic and genomic 
data with gene-expression data will nevertheless 
be complicated by the heterogeneity of AML 
(or that of any other cancer) and the unknown 
selection process that cancer cells go through 
before diagnosis and sample collection. It is well 
established that rare AML stem cells initiate this 
leukaemia, and it has been suggested that they 
may also be a source of cells for relapse”. It is 
therefore unclear whether the gene-expression 
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profiles of the bulk of the AML samples in this 
study will be informative for our understand- 
ing of AML development, or ifinstead we need 
expression profiles of the stem cells. Patient 
data alone will probably not be sufficient to 
optimally unravel significant relationships 
across all information layers. Animal models, 
including mice, zebrafish and flies, are increas- 
ingly needed to study the effects of cancer muta- 
tions and their combinations in a controlled 
manner. 

Another question that arises in relation 
to gene-regulatory perturbations is to what 
extent non-coding mutations may contribute 
to oncogenic processes. The AML study identi- 
fied a small number of recurrent mutations in 
microRNAs (small regulatory RNAs that do 
not encode proteins), but none in cis-regula- 
tory control elements (non-coding sequences, 
such as promoter regions, that typically con- 
tain transcription-factor binding sites). How- 
ever, recent evidence, such as the identification 
of mutations in the promoter sequence of the 
TERT gene in melanoma cells’, suggests that 
non-coding mutations can be important. 

The complexity of gene regulation and our 
poor understanding of regulatory genomic 
regions mean that identifying non-coding 
mutations that affect gene expression is 
unlikely to be a trivial task. Data from the 
ENCODE project’ could help us to prioritize 
candidate cis-regulatory variations, but addi- 
tional integrative genomics studies will inevi- 
tably be necessary. For example, studies using 
chromatin profiling (using DNase I hyper- 
sensitivity or FAIRE-seq assays) or histone 
modification (using ChIP-seq) during cancer 
initiation in model organisms could provide 
additional mechanistic clues about the early 
stages of cancer development. 

We now have an exceptionally clear view of 
the repertoire of protein-coding driver muta- 
tions in AML, how they cluster together and 
how they can predict patients’ responses to 
therapy’”. But this is not the end — we are only 
at the beginning of an era in which integra- 
tion of rich data sets that probe the genome, 
epigenome and transcriptome will help us to 
unravel the intricate regulatory connections 
between genetics and cancer. = 
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Hot and deep 


The landscape of Afar in Ethiopia (pictured) is 
tortured, because underlying tectonic plates 
are pulling apart from each other. Such rifting 
can lead to continental break-up, and is often 
accompanied by voluminous magmatism — 
the production of large amounts of melt. In 
this issue, Ferguson et a/. report the cause of 
this magmatism in Afar (D. J. Ferguson et al. 
Nature 490, 70-73; 2013). 

The authors developed a model of 
magmatism in the region using geochemical 
data from lavas that erupted along the rift. 
They conclude that melting is generated at 
great depths — 80 kilometres or more — 
and is driven by an unusually hot region of 
the mantle. 

Using another model, Ferguson and 
colleagues tracked the development 
of melting at the rift, and found that 
thinning of the tectonic plate over the past 
30 million years has been much less than 
expected. This suggests that an abrupt 
phase of plate thinning during the final 
stages of break-up would be required for an 
ocean basin to form in Afar. Andrew Mitchinson 
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An acidic link 


Obese people are at higher risk of multiple types of cancer, but why? One 
explanation could be that obesity enhances the production of pro-inflammatory, 
and carcinogenic, bile acids by gut microorganisms. SEE LETTER P.97 


SUZANNE DEVKOTA & PETER J. TURNBAUGH 


he rise in the global prevalence of obesity 

| has been accompanied by a wide array 
of other morbidities, including diabetes, 
cardiovascular disease and cancer. Despite 
strong epidemiological data that link obesity 
to a higher risk of developing numerous 
cancers, the mechanisms underlying this 
connection remain unclear. A few impor- 
tant clues have been uncovered: that obesity- 
associated inflammation contributes to liver 
cancer’; that obesity is associated with marked 
changes to the trillions of microbes found in 
the gastrointestinal tract’; and that obesity- 
associated bacteria can produce inflamma- 
tory metabolites*. On page 97 of this issue, 
Yoshimoto et al.* present a plausible link by 
which deoxycholic acid, an obesity-associated 
by-product of microbial bile-acid metabolism, 
might contribute to hepatic inflammation 


and the subsequent progression to cancer in 
obese mice*. 

The authors began by using a strain of 
mouse in which expression of a gene that 
induces the senescence-associated secretory 
phenotype (SASP) can be monitored non- 
invasively by luminescence. Senescence, or 
cell-cycle arrest, has conventionally been 
viewed as a favourable process when trying to 
correct DNA damage and halt abnormal cell 
proliferation, but it has more recently been 
shown that senescent cells are active and can 
produce pro-inflammatory signalling proteins 
that promote tumour growth — key hallmarks 
of SASP*. Yoshimoto et al. found that tumour 
initiation with a chemical carcinogen triggered 
luminescence in the abdomen (indicative of 
liver cancer) of obese mice fed a high-fat diet, 
whereas lean mice fed a standard diet were 


*This article and the paper under discussion‘ were 
published online on 26 June 2013. 
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protected from liver cancer. The research- 
ers saw a similar response in mice that were 
deficient in the appetite-regulating hormone 
leptin, demonstrating that both dietary and 
genetically induced obesity can promote liver 
cancer. In situ analysis of gene expression in the 
livers of these obese mice revealed the expres- 
sion of multiple components of SASP. 

But how exactly does obesity stimulate 
SASP? The authors’ experiments using antibi- 
otics implicate obesity-associated gut microbes. 
Treatment of the mice with a cocktail of four 
antibiotics resulted in a marked reduction of 
liver cancer, as did treatment with another 
antibiotic, vancomycin, to target bacteria in 
the Firmicutes phylum, which are found at 
higher abundance in obese animals®. Further- 
more, the authors detected high serum levels 
of deoxycholic acid (DCA) in mice fed a high- 
fat diet, and observed that these levels were 
reduced by vancomycin treatment. DCA is a 
secondary bile acid produced by gut microbes 
such as Clostridium sordellii; it is known to be 
carcinogenic and has long been implicated in 
colorectal cancer’. When the authors inhibited 
microbial 7a-dehydroxylation, the biochemi- 
cal reaction that produces DCA, liver cancer 
was suppressed in obese mice, whereas when 
they supplemented antibiotic-treated animals 
fed a high-fat diet with DCA, carcinogenesis 
was enhanced. 

Together, these results emphasize the key 
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Figure 1 | How obesity increases cancer risk. 
Yoshimoto et al.’ show that mice that are obese as 

a result of a high-fat diet or genetic predisposition 
produce higher levels of deoxycholic acid (DCA), 
a by-product of the bile-acid metabolism of certain 
gut microbes. DCA is a carcinogen that can induce 
DNA damage and the senescence-associated 
secretory phenotype (SASP), which is associated 
with tumour growth. 


part that bile acids play in mediating host- 
microbe interactions in the gastrointestinal 
tract (Fig. 1). Bile acids are typically found 
at high-millimolar concentrations in the gut 
lumen, where they are converted to a diverse 
pool of secondary bile acids, including DCA. 
Enterohepatic circulation of bile acids pro- 
vides an efficient route by which these micro- 
bial metabolites could reach the liver. A recent 
study* showed that a high-fat diet stimulates 
bile-acid production that promotes the growth 
of Bilophila wadsworthia, a pathogenic bacte- 
rium that causes colitis in genetically suscepti- 
ble mice. Thus, microbial bile-acid metabolism 
may provide an under-appreciated mechanism 
by which our decisions at the dinner table 
can translate to disastrous consequences for 
our health. 

Now that more links between bile acids 
and disease are emerging, it will be crucial 
to revisit how gut microbes metabolize bile 
acids, and what effects their by-products 
may have. Could there be ‘metabolic power- 
house’ organisms that are highly adapted to 
use bile, similarly to the Bacteroides species 
that metabolize polysaccharides’? What are 
the key genetic and biochemical mechanisms 
underlying microbial bile-acid metabolism, 
and what tunes their activity in vivo? Multi- 
ple factors have already been shown to alter 
the overall pool of bile acids, including diet®, 
microbial colonization” and gastric-bypass 
surgery", but more work needs to be done to 
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explore the health implications of these shifts 
in humans. It will be important to develop 
comparative approaches that will allow for the 
comprehensive measurement of the bile-acid 
composition in various host tissues. These 
tools could enable us to have a broader under- 
standing of how the consumption of different 
dietary components alters the bile-acid pool, 
and whether or not this is a primary host factor 
that shapes the structure and metabolic activ- 
ity of gut microbial communities that are 
associated with diseases such as obesity. 

Furthermore, the by-products of micro- 
bial bile-acid metabolism are not only passive 
carcinogens but also active signalling mol- 
ecules. For example, bile-acid activation of 
the farnesoid X nuclear receptor and TGR5 
in the liver has been shown to regulate lipid 
and glucose homeostasis, as well the synthesis 
of the bile acids themselves’”. These receptors 
are also found outside the liver in tissues such 
as the heart, kidney and thymus, suggesting 
that bile acids have important roles throughout 
the body. 

Yoshimoto and colleagues’ work empha- 
sizes the need to consider both host and 
microbial factors that influence the tumour 
microenvironment. Continued collaboration 
between cancer biologists, microbiologists, 
immunologists and biochemists promises to 


provide a mutually beneficial symbiosis aimed 
at achieving a mechanistic understanding of 
the many ways in which microbial metabolism 
can contribute to or be shaped by cancer. This 
understanding brings with it the promise of 
better tools for risk assessment, diagnostics 
and therapeutic intervention. = 
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Sensing when 
it’s time for sex 


Malaria parasites switch between developmental stages to facilitate their 
transmission to the mosquito vector. This switch seems to be initiated by 
parasite-to-parasite communication through membrane-bound vesicles. 


LEANN TILLEY & MALCOLM MCCONVILLE 


the development of complex multicellular 
organisms. Single-celled organisms, 
including bacteria, can also communicate 
with each other by secreting small molecules 
that are sensed by their compatriots, as well 
as other organisms in their vicinity. These 
signals enable microbes to sense nutrient con- 
ditions or environmental stresses and to initi- 
ate adaptive growth responses that optimize 
population survival’. Papers by Regev-Rudzki 
et al.” in Cell and Mantel et al.’ in Cell Host 
Microbe report the identification of an unex- 
pected form of intercellular communication, 
involving extracellular microvesicles, in the 
deadly human malaria parasite Plasmodium 
falciparum. 
Parasitic microorganisms often switch 


| ntercellular communication is crucial for 
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between different developmental stages in 
their animal or human hosts. This trait allows 
them to invade various host cells and to 
improve their chances of being transmitted by 
an intermediate host or vector; in some cases, 
the switch has been shown to be mediated by 
secreted parasite factors’. In the case of P.falci- 
parum, a small proportion of parasites in the 
rapidly dividing asexual stage — which prolif- 
erate in red blood cells (RBCs) of the human 
host — differentiate into the sexual gametocyte 
stage that is required for transmission through 
the Anopheles mosquito vector. This switch 
can be induced by stresses associated with 
overcrowding, the host immune response or 
exposure to drugs. Although there is evidence” 
that gametocyte differentiation can be triggered 
by secreted or released parasite factors, these 
have not been defined. 

The two new reports provide a potential 
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Figure 1 | Release of vesicles from red blood cells. Uninfected red blood cells (RBCs) shed 
cell-membrane-derived vesicles containing damaged cellular components. The vesicles are taken up by 
immune cells that digest and dispose of this cellular debris. Regev-Rudzki et al.’ and Mantel et al.’ show 
that RBCs infected with malaria parasites shed more vesicles, and that these exome-like vesicles (ELVs) 
can mediate transfer of parasite-derived DNA and proteins to other infected RBCs. This provides a 
form of parasite-to-parasite communication that, among other possible functions, can induce parasites 
in the recipient cells to differentiate from an asexual to a sexual (gametocyte) form. Vesicle release also 
contributes to the symptoms of malaria infections by activating inflammatory responses from cells such 


as macrophages. 


mechanism for how asexual-stage parasites 
generate extracellular signals. Using com- 
plementary approaches, the studies demon- 
strate that small vesicles released from the cell 
membrane of cultured P falciparum-infected 
RBCs can mediate the transfer of DNA 
(specifically, plasmids that encode drug-resist- 
ance mediators or fluorescent proteins) from 
one infected RBC to another. These extra- 
cellular vesicles range from 70 to 250 nano- 
metres in diameter”’ and contain both RBC and 
parasite-derived proteins*. The authors refer 
to them as exosome-like vesicles (ELVs)* — 
by analogy to small vesicles that are generated 
in the endolysosome system of mammalian 
cells and shed extracellularly — or RBC- 
derived microvesicles*. The term ELVs will 
be used here. 

Circulating microparticles originating 
from RBCs have previously been reported in 
patients with malaria‘ and in mice infected 
with the rodent malaria parasite Plasmo- 
dium berghei’; in the latter, these have been 
shown to have strong pro-inflammatory 
activity. Similarly, the ELVs characterized in 
the current studies seem to be involved in the 
inflammatory response. They are taken up by 
other host cells, including macrophages and 
neutrophils, leading to cell activation and the 
production of cell-signalling molecules called 
cytokines (Fig. 1). This may benefit the para- 
site by increasing the expression of receptors 
on the endothelial cells to which infected RBCs 
adhere, thus avoiding parasite clearance in the 
host’s spleen’. It may also lead to deregulated 
inflammation and severe complications, such 
as cerebral malaria’. 


Remarkably, both groups found that ELVs 
are also internalized by other infected RBCs, 
leading to differentiation of parasites in the 
recipient cells into gametocytes (Fig. 1). The 
accumulation of ELVs in the serum of infected 
individuals may therefore constitute a signal 
that initiates and regulates the formation of 
transmissive parasite stages to maximize their 
passage to the mosquito vector. ELV forma- 
tion seems to be enhanced by drug treatment’, 
and so this process might also constitute a 
danger signal that drives the parasite into a 
resistant state. 

The current data provide convincing evi- 
dence for parasite-to-parasite communication 
in in vitro cultures, but whether ELVs mediate 
communication at the low parasite densities 
that occur in vivo remains to be determined. 
An intriguing possibility is that ELV-mediated 
sequestration of P. falciparum-infected RBCs 
onto the walls of blood vessels could lead to 
a local increase in parasite density and aid 
vesicle-mediated communication. 

Although the mechanism by which ELVs 
are formed is not known, it is well established 
that uninfected RBCs normally shed mem- 
brane vesicles, albeit at a much lower level 
than P. falciparum-infected RBCs. Vesicle 
shedding in uninfected RBCs is thought to be a 
mechanism for removing damaged proteins or 
membrane components””’ (Fig. 1). It is tempt- 
ing to speculate that the malaria parasite has 
exploited this process to enable cell-to-cell 
communication. Genetic studies performed 
by Regev-Rudzki et al. suggest that ELV forma- 
tion depends on the parasite protein Pf{PTP2, 
which is associated with vesicles in the RBC 
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50 Years Ago 


Investigations were conducted 
here to determine the effect of 

a magnetic field on the ripening 
of green tomatoes (Lycopersicon 
esculentum Mill. Var. V. R. 
Moscow). Four permanent 
magnets of considerable strength 
were utilized. Fruits of uniform 
maturity were placed between 
the magnetic poles ... The 
ripening rates of treated fruits were 
compared with those of untreated 
controls in the same room under 
similar conditions ... In all cases 
the treated fruits ripened faster 
than the controls. Furthermore, 
the fruits nearest the magnetic 
south ripened faster than 

those nearest the magnetic 
north. 

From Nature 6 July 1963 


100 Years Ago 


In recent issues of Nature several 
correspondents, in referring to 
the fact that a metal bedstead or 

a few wires stretched a few feet 
above the ground will make a 
wireless antenna, have overlooked 
a most important point, viz. that 
with such an antenna the ordinary 
methods of tuning are quite 
useless. A piece of wire netting 
suspended a few feet above the 
ground makes a most effective 
aerial, and enables one to receive 
loud signals from long-distance 
stations, but signals from Eiffel 
Tower, Cleethorpes, &c. will all 

be mixed up, and the ordinary 
tuner will not separate them 
effectively ... Wireless signals that 
are feeble when the surface of the 
earth is dry, becoming stronger 
after rain, and the well-known fact 
that these waves travel much better 
over sea than over land, all seem 
to indicate that the aerial waves 
are at least supplemented by 
waves that travel along the surface 
of the earth. 

From Nature 3 July 1913 
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cytoplasm. These vesicles bud from membrane 
sacks called Maurer’s clefts (found in infected 
RBCs) that regulate the transport of para- 
site proteins to the RBC membrane. PfPTP2 
might be directly involved in ELV budding or, 
alternatively, in regulating the packaging of 
parasite molecules into the membrane buds. 
Future studies could assess whether parasites 
that lack Pf{PTP2 are deficient in gametocyte 
formation, or whether other parasite lipids, 
oligonucleotides or proteins, including those 
previously implicated in gametocytogen- 
esis'”’, are required for this process. 

Key questions remain, including: how are 
large biomolecules, such as DNA plasmids, 
transferred from the parasite nucleus or cyto- 
plasm to the ELVs, and then targeted to the 
nucleus of the recipient parasite? This process 
would require a tortuous journey across sev- 
eral membranes and compartments, includ- 
ing those of the RBC, which lacks its own 
trafficking machinery. It is conceivable that 
the nucleic-acid cargo is first packaged into 
double-membrane vesicles, in a process simi- 
lar to the formation of exosomes in animal 
cells. The vesicles could then transport their 
contents to the cytoplasm of the host RBC. 
Interestingly, analysis of timing of ELV forma- 
tion in the two studies raised the possibility 
that different classes of vesicles are generated 
during the parasite’s early ring stage and late 
schizont stage, and that these potentially involve 
different biogenic pathways. 

These studies highlight the potential impor- 
tance of ELVs in immune regulation and of 
parasite-to-parasite signalling in malaria, 
and suggest that they are crucial during acute 
infections and for efficient parasite transmis- 
sion. However, other functions could also be 
considered. For example, membrane blebbing 
might remove oxidized or damaged proteins 
or lipids and thereby help to maintain the 
integrity of an infected RBC for long enough 
to allow the parasite to complete its asexual 
replication cycle. The findings could also 
have technical implications, by providing new 
methods for rapidly generating genetically 
modified parasites and for the production 
of gametocytes. Importantly, the formation 
of ELVs represents a possible new target for 
antimalarial drugs. Successfully targeting 
this pathway might both reduce the severity 
of the infection and interrupt transmission. 
Field studies and mathematical modelling 
suggest that inhibiting transmission is cru- 
cial to achieving malaria eradication, and 
these findings may provide a boost to that 
long-term goal. m 
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Enzymes activated by 
synthetic components 


Synthetic analogues of the catalytic subsite of the hydrogen-producing enzyme 
HydAI1 have been disappointingly inactive. The incorporation of such analogues 
into the enzyme’s active site reveals the requirements for activity. SEE LETTER P.66 


RYAN D. BETHEL & 
MARCETTA Y. DARENSBOURG 


s our knowledge of biosynthetic 
Aver evolves, experiments that 

interfere at specific points in these 
molecular assembly lines may be judiciously 
designed. On page 66 of this issue, Berggren 
et al.' report just such a strategy in their 
study of a [FeFe]-hydrogenase enzyme called 
HydA1, which mediates the remarkably 
efficient production of hydrogen gas from 
water-derived hydrogen ions**. The authors’ 
findings increase our understanding of how 
the enzyme’ active site is constructed”. 

Buried deep within HydA1, the enzyme’s 
active site consists of a [4Fe-4S] cluster (a 
group of four iron and four sulphur atoms), 
which serves as a storage and conduit unit for 
electrons, and a [2Fe] subsite that is the real 
engine of the catalyst*. The [2Fe] subsite is 
actually a small molecule in which two iron 
atoms are bound by diatomic ligand mol- 
ecules (carbon monoxide and cyanide) and 
connected by a unique dithiolate bridge, 
SCH,XCH,S, where the identity of X has 
been contentious, but could be carbon (CH,), 
oxygen or nitrogen (NH). The subsite is 
attached to the protein only at the embedded 
[4Fe-4S] cluster, through a bridge formed by 
the sulphur atom from a cysteine amino-acid 
residue. 

Organometallic chemists have made syn- 
thetic analogues of the subsite that resemble 
its structure, but these exhibit low catalytic 
activity for the hydrogen-forming reaction 


*This article and the paper under discussion’ were 
published online on 26 June 2013. 
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in the absence of the protein. A fundamental 
question is whether these analogues of a small 
molecule can be recognized at the appropri- 
ate point in the biological assembly of the 
active site of HydA1, and so be inserted into 
that site. If incorporated into the incomplete 
protein, would synthetic [2Fe] units be cata- 
lytically active? And could the insertion of 
synthetic analogues into the site be used as a 
technique to interrogate why their activity is 
low? More specifically, could this approach be 
used to identify the elusive bridgehead X of the 
dithiolate group? 

Although a lot is known about the events 
that control the generation and combination 
of the components of the [2Fe] subsite, and 
about the maturase proteins involved’, much 
remains to be clarified. Nevertheless, it is 
widely accepted that the [2Fe] subsite is built 
on a scaffold provided by the ‘apo’ (incom- 
plete) form of an isolable protein known as 
HydF (ref. 7). Once the [2Fe] unit is formed, 
the resulting ‘holo’ (complete) form of HydF 
serves as a delivery agent, shuttling its cargo 
to apo-HydA1, where the required [4Fe-4S] 
cluster resides at the end of a deep cavity’. 
On acceptance of the [2Fe] subunit, HydA1 
matures: the channel that provided access for 
the subunit collapses, generating the complete 
hydrogenase enzyme in which the [4Fe-4S] 
cluster and the [2Fe] subunit are fully encap- 
sulated. It has been postulated that cavity col- 
lapse causes one carbon monoxide ligand to 
be lost from the [2Fe] subunit and another 
to shift into a bridging position between the 
two irons. This would change the [2Fe] sub- 
unit from a symmetrical structure (akin to 
the structures of its synthetic analogues) to a 
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cytoplasm. These vesicles bud from membrane 
sacks called Maurer’s clefts (found in infected 
RBCs) that regulate the transport of para- 
site proteins to the RBC membrane. PfPTP2 
might be directly involved in ELV budding or, 
alternatively, in regulating the packaging of 
parasite molecules into the membrane buds. 
Future studies could assess whether parasites 
that lack Pf{PTP2 are deficient in gametocyte 
formation, or whether other parasite lipids, 
oligonucleotides or proteins, including those 
previously implicated in gametocytogen- 
esis'”’, are required for this process. 

Key questions remain, including: how are 
large biomolecules, such as DNA plasmids, 
transferred from the parasite nucleus or cyto- 
plasm to the ELVs, and then targeted to the 
nucleus of the recipient parasite? This process 
would require a tortuous journey across sev- 
eral membranes and compartments, includ- 
ing those of the RBC, which lacks its own 
trafficking machinery. It is conceivable that 
the nucleic-acid cargo is first packaged into 
double-membrane vesicles, in a process simi- 
lar to the formation of exosomes in animal 
cells. The vesicles could then transport their 
contents to the cytoplasm of the host RBC. 
Interestingly, analysis of timing of ELV forma- 
tion in the two studies raised the possibility 
that different classes of vesicles are generated 
during the parasite’s early ring stage and late 
schizont stage, and that these potentially involve 
different biogenic pathways. 

These studies highlight the potential impor- 
tance of ELVs in immune regulation and of 
parasite-to-parasite signalling in malaria, 
and suggest that they are crucial during acute 
infections and for efficient parasite transmis- 
sion. However, other functions could also be 
considered. For example, membrane blebbing 
might remove oxidized or damaged proteins 
or lipids and thereby help to maintain the 
integrity of an infected RBC for long enough 
to allow the parasite to complete its asexual 
replication cycle. The findings could also 
have technical implications, by providing new 
methods for rapidly generating genetically 
modified parasites and for the production 
of gametocytes. Importantly, the formation 
of ELVs represents a possible new target for 
antimalarial drugs. Successfully targeting 
this pathway might both reduce the severity 
of the infection and interrupt transmission. 
Field studies and mathematical modelling 
suggest that inhibiting transmission is cru- 
cial to achieving malaria eradication, and 
these findings may provide a boost to that 
long-term goal. m 
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Synthetic analogues of the catalytic subsite of the hydrogen-producing enzyme 
HydAI1 have been disappointingly inactive. The incorporation of such analogues 
into the enzyme’s active site reveals the requirements for activity. SEE LETTER P.66 
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s our knowledge of biosynthetic 
Aver evolves, experiments that 

interfere at specific points in these 
molecular assembly lines may be judiciously 
designed. On page 66 of this issue, Berggren 
et al.' report just such a strategy in their 
study of a [FeFe]-hydrogenase enzyme called 
HydA1, which mediates the remarkably 
efficient production of hydrogen gas from 
water-derived hydrogen ions**. The authors’ 
findings increase our understanding of how 
the enzyme’ active site is constructed”. 

Buried deep within HydA1, the enzyme’s 
active site consists of a [4Fe-4S] cluster (a 
group of four iron and four sulphur atoms), 
which serves as a storage and conduit unit for 
electrons, and a [2Fe] subsite that is the real 
engine of the catalyst*. The [2Fe] subsite is 
actually a small molecule in which two iron 
atoms are bound by diatomic ligand mol- 
ecules (carbon monoxide and cyanide) and 
connected by a unique dithiolate bridge, 
SCH,XCH,S, where the identity of X has 
been contentious, but could be carbon (CH,), 
oxygen or nitrogen (NH). The subsite is 
attached to the protein only at the embedded 
[4Fe-4S] cluster, through a bridge formed by 
the sulphur atom from a cysteine amino-acid 
residue. 

Organometallic chemists have made syn- 
thetic analogues of the subsite that resemble 
its structure, but these exhibit low catalytic 
activity for the hydrogen-forming reaction 


*This article and the paper under discussion’ were 
published online on 26 June 2013. 
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in the absence of the protein. A fundamental 
question is whether these analogues of a small 
molecule can be recognized at the appropri- 
ate point in the biological assembly of the 
active site of HydA1, and so be inserted into 
that site. If incorporated into the incomplete 
protein, would synthetic [2Fe] units be cata- 
lytically active? And could the insertion of 
synthetic analogues into the site be used as a 
technique to interrogate why their activity is 
low? More specifically, could this approach be 
used to identify the elusive bridgehead X of the 
dithiolate group? 

Although a lot is known about the events 
that control the generation and combination 
of the components of the [2Fe] subsite, and 
about the maturase proteins involved’, much 
remains to be clarified. Nevertheless, it is 
widely accepted that the [2Fe] subsite is built 
on a scaffold provided by the ‘apo’ (incom- 
plete) form of an isolable protein known as 
HydF (ref. 7). Once the [2Fe] unit is formed, 
the resulting ‘holo’ (complete) form of HydF 
serves as a delivery agent, shuttling its cargo 
to apo-HydA1, where the required [4Fe-4S] 
cluster resides at the end of a deep cavity’. 
On acceptance of the [2Fe] subunit, HydA1 
matures: the channel that provided access for 
the subunit collapses, generating the complete 
hydrogenase enzyme in which the [4Fe-4S] 
cluster and the [2Fe] subunit are fully encap- 
sulated. It has been postulated that cavity col- 
lapse causes one carbon monoxide ligand to 
be lost from the [2Fe] subunit and another 
to shift into a bridging position between the 
two irons. This would change the [2Fe] sub- 
unit from a symmetrical structure (akin to 
the structures of its synthetic analogues) to a 
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Figure 1 | Assembly line interrupted. Berggren et al.' have bypassed much of the biosynthesis of the HydA1 enzyme, to introduce synthetic analogues of the 
enzyme's [2Fe] subsite. They observed that the analogues first become incorporated into the apo (incomplete) HydF protein by binding to its [4Fe-4S] cluster, 
and propose that a cyanide ligand acts as a bridge between iron atoms in the cluster and the subsite analogue. This temporary bridge is used during the transfer 
of the [2Fe] unit to a similar [4Fe-4S] cluster in apo-HydA1: a sulphur atom (green) from a cysteine amino-acid residue in HydA1 latches onto the [2Fe] cargo, 
releasing a carbon monoxide (CO) ligand from the subsite and yielding the mature form of HydA1. 


‘rotated’ isomer that is catalytically active’. 

Berggren et al. prepared apo-HydF from 
the bacterium Thermotoga maritima by over- 
expressing it in Escherichia coli bacteria, and 
then added it to three synthetic analogues of 
the [2Fe] subunit. The three analogues dif- 
fered only in the bridgehead atom X, which 
was carbon, nitrogen or oxygen. Fortunately, 
the diatomic ligands in the analogues can be 
easily detected using infrared spectroscopy. 
This enabled the researchers to track the arti- 
ficial subunits as they passed from solution 
into HydE, and eventually into HydA1. Even 
better, the spectroscopic signatures of syn- 
thetic and natural [2Fe] subsites are highly 
sensitive to changes in their environment. 
This allowed Berggren and colleagues to con- 
firm that HydF proteins did indeed bind to the 
synthetic [2Fe] subunits, and that the chemical 
environment of the diatomic ligands was simi- 
lar to that found in the isolated, natural form of 
the protein. 

The authors used another spectroscopic 
technique, electron paramagnetic resonance, 
to determine how the synthetic [2Fe] sub- 
sites are attached to the [4Fe-4S] clusters. 
This revealed an unexpected role for a cyan- 
ide ligand in the subsites. Cyanide ligands 
have long been known to aggregate metals, 
as in the widely used pigment Prussian blue. 
Berggren et al. suggest that this ability allows 
a cyanide ligand to act as a temporary con- 
nection between an iron atom in the subsite 
and one of those in a [4Fe-4S] cluster of apo- 
HydF (Fig. 1). As HydF transfers its pack- 
age to apo-HydA1, a cysteine sulphur atom 
binds the [2Fe] site to form mature HydA1, 


the cyanide detaches from the HydF cluster 
and a carbon monoxide ligand is lost from 
the subsite. 

So, are the semi-synthetic enzymes func- 
tional? To answer this question, Berggren and 
co-workers combined apo-HydA 1 with one of 
the following: an empty scaffold (apo-HydF); 
HydF bound to the naturally occurring [2Fe] 
subsite; or HydF bound to each of the artifi- 
cial [2Fe] subsites. The authors found that no 
hydrogen gas was produced in reactions using 
apo-HydEF, or for scaffolds bound to artificial 
[2Fe] subsites in which the bridgehead atom 
X was carbon or oxygen. However, when they 
tested the [2Fe] subsite that had a nitrogen 
bridgehead, they observed vigorous hydrogen- 
gas production, comparable to that of the nat- 
urally occurring enzyme. Notably, under the 
same assay conditions and in the absence of 
apo-HydA1, HydF loaded with the nitrogen- 
bridged subsite was inactive. 

The researchers’ findings beg the ques- 
tion, can apo-HydA1 be loaded with a syn- 
thetic [2Fe] analogue in the absence of HydF? 
Something similar has already been done: 
cofactors, such as iron-containing haem, that 
bind weakly to proteins have been replaced by 
geometrically similar synthetic catalysts'*"! 
simply by mixing the artificial cofactor and 
the apo-enzyme, without scaffold proteins. 
Whether the intricacies of apo-HydA1 will 
recognize and accept a simple synthetic [2Fe] 
subsite analogue alone is not yet known. 

The discovery of an active, semi-synthetic 
variant of HydA1 is an exciting result. It dem- 
onstrates that the complex maturase machin- 
ery used to construct the enzyme’s active site 
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can be circumvented, and provides research- 
ers with a simple system for producing active 
[FeFe]-hydrogenases from various organ- 
isms. It also means that inorganic chemists 
were on the right track with their models of 
the [2Fe] subsite, even though those models 
had low catalytic activity — Berggren and col- 
leagues’ findings reveal that although a nitro- 
gen in the bridgehead is crucial for hydrogen 
formation, the subsite cannot function opti- 
mally until it is incorporated into the proper 
protein cavity. = 
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Comprehensive molecular characterization 
of clear cell renal cell carcinoma 


The Cancer Genome Atlas Research Network* 


Genetic changes underlying clear cell renal cell carcinoma (ccRCC) include alterations in genes controlling cellular oxygen 
sensing (for example, VHL) and the maintenance of chromatin states (for example, PBRMI1). We surveyed more than 400 
tumours using different genomic platforms and identified 19 significantly mutated genes. The PI(3)K/ AKT pathway was 
recurrently mutated, suggesting this pathway as a potential therapeutic target. Widespread DNA hypomethylation was 
associated with mutation of the H3K36 methyltransferase SETD2, and integrative analysis suggested that mutations 
involving the SWI/SNF chromatin remodelling complex (PBRM1, ARID1A, SMARCA4) could have far-reaching effects 
on other pathways. Aggressive cancers demonstrated evidence of a metabolic shift, involving downregulation of genes 
involved in the TCA cycle, decreased AMPK and PTEN protein levels, upregulation of the pentose phosphate pathway and 
the glutamine transporter genes, increased acetyl-CoA carboxylase protein, and altered promoter methylation of miR-21 
(also known as MIR21) and GRB10. Remodelling cellular metabolism thus constitutes a recurrent pattern in ccRCC that 
correlates with tumour stage and severity and offers new views on the opportunities for disease treatment. 


Kidney cancers, or renal cell carcinomas (RCC), are a common group 
of chemotherapy-resistant diseases that can be distinguished by his- 
topathological features and underlying gene mutations’. Inherited 
predisposition to RCC has been shown to arise from genes involved 
in regulating cellular metabolism, making RCC a model for the role of 
an oncologic-metabolic shift, commonly referred to as the “Warburg 
effect’, leading to malignancy’. The most common type of RCC, clear 
cell renal cell carcinoma (ccRCC), is closely associated with VHL gene 
mutations that lead to stabilization of hypoxia inducible factors (HIF- 
1a and HIF-2x, also known as HIFIA and EPAS1) in both sporadic 
and familial forms. PBRM1, a subunit of the PBAF SWI/SNE chro- 
matin remodelling complex, as well as histone deubiquitinase BAP1 
and histone methyltransferase SETD2, were recently found to be 
altered in ccCRCC’”, implicating major roles for epigenetic regulation 
of additional functional pathways participating in the development 
and progression of the disease. Oncogenic metabolism and epigenetic 
reprogramming have thus emerged as central features of ccCRCC. 

In the present study, clinical and pathological features, genomic altera- 
tions, DNA methylation profiles, and RNA and proteomic signatures 
were evaluated in cCRCC. We accrued more than 500 primary nephrec- 
tomy specimens from patients with histologically confirmed ccRCC 
that conformed to the requirements for genomic study defined by the 
Cancer Genome Atlas (TCGA), together with matching ‘normal’ geno- 
mic material. Samples were restricted to those that contained at least 60% 
tumour nuclei (median 85%) by pathological review (clinical data sum- 
mary provided in Supplementary Table 1). A data freeze representing 446 
samples was generated from at least one analytical platform (‘Extended’ 
data set) and data from all platforms were available for 372 samples for 
coordinated, integrative analyses (‘Core’ data set) (Supplementary Data 1, 
Supplementary Table 2). No substantial batch effects in the data that 
might confound analyses were detected (Supplementary Figs 1-20). 


Somatic alterations 

The global pattern of somatic alterations, determined from analysis 
of 417 samples, is shown in Fig. la. DNA hybridizations showed 
that recurrent arm-level and focal somatic copy number alterations 


(SCNAs) occurred at a fewer sites than is generally observed in other 
cancers (P< 0.0004; Supplementary Figs 21-22 and Supplementary 
Table 3). However, SCNAs that were observed more commonly 
involved entire chromosomes or chromosome arms, rather than focal 
events (17% vs 0.4%, Fig. 1b). Notably, the most frequent arm-level 
events involved loss of chromosome 3p (ref. 6; 91% of samples), 
encompassing all of the four most commonly mutated genes (VHL, 
PBRM1, BAPI and SETD2). 

The data also suggested lower and more variable tumour cellularity’ 
in the accrued samples, compared to conventional pathological review 
(median 54% + 14%). This may reflect stromal or endothelial cell 
contributions, or tumour cell heterogeneity. A recent study of multiple 
samples from single tumours has demonstrated significant regional 
genomic heterogeneity, but with shared mutations in frequently mutated 
genes and convergent evolution of other common gene level events*. The 
mutation frequencies of key genes (VHL, PBRM1 and so on), as well as 
copy number gains and losses found here, were, however, consistent 
with previous reports. Tumour purity was therefore not determined to 
be a limitation in the current study. 

Arm level losses on chromosome 14q, associated with loss of 
HIF1A, which has been predicted to drive more aggressive disease’, 
were also frequent (45% of samples). Gains of 5q were observed (67% 
of samples) and additional focal amplifications refined the region of 
interest to 60 genes in 5q35, which was particularly informative as 
little has been known about the importance of this region in ccRCC 
since the 5q gain was initially described. Focal amplification also 
implicated the protein kinase C member PRKCI (ref. 10), and the 
MDS1 and EVI1 complex locus MECOM at 3p26, the p53 regulator 
MDM4 at 1q32, MYC at 8q24 and JAK2 on 9p24. Focally deleted 
regions included the tumour suppressor genes CDKN2A at 9p21 and 
PTEN at 10q23, putative tumour suppressor genes NEGRI at 1p31, 
QKI at 6q26, and CADM2 at 3p12 and the genes that are frequently 
deleted in cancer, PTPRD at 9p23 and NRXN3 at 14q24 (ref. 11). 

Whole-exome sequencing (WES) of tumours from 417 patients 
identified 36,353 putative somatic mutations, including 16,821 missense 
mutations, 6,383 silent mutations and 2,999 indels, with an average of 


*Lists of participants and their affiliations appear at the end of the paper. 
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Figure 1 | Somatic alterations in cCRCC. a, Top histogram, mutation events 
per sample; left histogram, samples affected per alteration. Upper heat map, 
distribution of fusion transcripts and VHL methylation across samples 

(n = 385 samples, with overlapping exome/SCNA/RNA-seq/methylation 
data); middle heat map, mutation events; bottom heat map, copy number gains 
(red) and losses (blue). Lower chart, mutation spectrum by indicated categories. 


1.1 + 0.5 non-silent mutations per megabase (Supplementary Figs 23- 
25). Mutations from 50 genes with high apparent somatic mutation 
frequencies (Supplementary Table 4) were independently validated 
using alternative sequencing instrumentation (Supplementary Fig. 26). 
In tumours from 22 patients, whole-genome sequencing was also used to 
validate and calibrate the WES data and confirmed 83% of the WES 
mutation-calls (Supplementary Tables 5 and 6). In line with results of 
previous studies (Supplementary Tables 7 and 8), the validated mutation 
data identified nineteen significantly mutated genes (SMGs) (false dis- 
covery rate (FDR) < 0.1), with VHL, PBRM1, SETD2, KDM5C, PTEN, 
BAP1, MTOR and TP53 representing the eight most extreme members 
(q < 0.00001) (Fig. 1a). Eleven additional SMGs were of considerably 
lower significance (q<0.1-0.5) but included known cancer genes. 
Among all SMGs, only mutation of BAP1 correlated with poor survival 
outcome (Supplementary Fig. 27)'*. Approximately 20% of cases had 
none of the 19 recorded SMGs, although many contained rare muta- 
tions in other known oncogenes or tumour suppressors, involving 
survival associations, illustrating the genetic complexity of ccRCC® 
(Supplementary Figs 28-30 and Supplementary Table 9). 

Eighty-four putative RNA fusions were identified in 416 ccRCC 
samples’*. Eleven of thirteen predicted events (Fig. 1c) were validated 
using targeted methods, consistent with an 85% true-positive rate 
(Supplementary Table 10 and Supplementary Figs 31-35). A recurrent 
SFPQ-TFE3 fusion (previously linked to non-clear cell translocation- 
associated RCC) was found in five samples, all of which were VHL 
wild type, indicating either that these tumours are a clear cell variant or 
that translocation-associated renal tumours may be histologically 
indistinguishable from conventional ccRCC. Furthermore, the TFE3 
protein as well as an X(p11) rearrangement was found in three of those 
samples, where there were available slides. 


DNA methylation profiles 


We observed epigenetic silencing of VHL in about 7% of ccRCC tumours, 
which was mutually exclusive with mutation of VHL (Fig. 1a), reflecting 
the central role of this locus in cCRCC”. An additional 289 genes showed 
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b, Left panel, frequency of arm-level copy-number alterations versus focal copy 
number alterations. Right panel, comparison of the average numbers of arm- 
level and focal copy-number changes in ccRCC, colon cancer (CRC), 
glioblastoma (GBM), breast cancer (BRCA) and ovarian cancer (OVCA). 

c, Circos plot of fusion transcripts identified in 416 samples of ccRCC, with 
recurrent fusions highlighted. 


evidence of epigenetic silencing in at least 5% of tumours. The top- 
ranked gene by inverse correlation between gene expression and 
DNA methylation was UQCRH, hypermethylated in 36% of the 
tumours. ee has been previously suggested to be a tumour 
suppressor’®, but not linked to ccRCC. Interestingly, increasing pro- 
moter hypermethylation frequency correlated with higher stage and 
grade (Fig. 2a, b). 

We also evaluated the global consequences of mutation in specific 
epigenetic modifiers. Mutations in SETD2, a non-redundant H3K36 
methyltransferase, were associated with increased loss of DNA methy- 
lation at non-promoter regions (Fig. 2c, d). This discovery is consistent 
with the emerging view that H3K36 trimethylation may be involved in 
the maintenance of a heterochromatic state!’, whereby DNA methyl- 
transferase 3A (DNMT3A) binds H3K36me3 and methylates nearby 
DNA”. Thus, reductions of H3K36me3 through SETD2 inactivation 
could lead indirectly to regional loss of DNA methylation. 


RNA expression 


Unsupervised clustering methods identified four stable subsets in 
both mRNA (m1-m4) and miRNA (mil-mi4) expression data sets 
(Fig. 3a and Supplementary Figs 36-39). Supervised clustering revealed 
the similarity of these new mRNA classes to the previously reported ccA 
and ccB expression subtypes’’, with cluster m1 corresponding to ccA 
and ccB divided between m2 and m3 (Supplementary Table 11). Cluster 
m4 probably accounts for the roughly 15% of tumours previously 
unclassified in the ccA/ccB classification scheme. Similarly, the survival 
advantage previously observed for ccA cases was again identified for 
m1 tumours (Fig. 3b). 

The ml subtype was characterized by gene sets associated with 
chromatin remodelling processes and a higher frequency of PBRM1 
mutations (39% in ml vs 27% in others, P= 0.027). Deletion of 
CDKN2A (53% vs 26%; P<0.0001) and mutations in PTEN (11% 
vs 1%; P < 0.0001) were more frequent in m3 tumours (Supplementary 
Fig. 5). The m4 group showed higher frequencies of BAP1 mutations 
(17% vs 7%; P = 0.002) and base-excision repair; however, this group 
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Figure 2 | DNA methylation and ccRCC. 
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kidney tissue and normal white blood cells 
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also harboured more mTOR mutations (12% vs 4%; P= 0.01) and 
ribosomal gene sets. 

Survival differences evident in miRNA-based subtypes (Supplemen- 
tary Figs 40-44) correlated with the mRNA data (Fig. 3b-d). For 
example, miR-21, previously shown to demonstrate strong regulatory 
interactions in cCRCC™ and with established roles in metabolism'”*!”” 
correlated strongly with worse outcome, and DNA promoter methyla- 
tion levels inversely correlated with expression of miR-21, miR-10b 
and miR-30a (Supplementary Tables 12-14). miRNA interactions 
thus represent a significant component of the epigenetic regulation 
observed in ccRCC. 


Integrative data analyses 


We used a combination of approaches for integrative pathway analysis. 
The HotNet** algorithm uses a heat diffusion model, to find sub- 
networks distinguished by both the frequency of mutation in genes 
(nodes in the network) and the topology of interactions between genes 
(edges in the network). In ccRCC, HotNet identified twenty-five sub- 
networks of genes within a genome-scale protein-protein interaction 
network (Supplementary Table 15 and Supplementary Fig. 45). The 
largest and most frequently mutated network contained VHL and inter- 
acting partners. The second most frequently mutated sub-network 
included PBRM1, ARIDIA and SMARCA4, key genes in the PBAF 
SWI/SNF chromatin remodelling complex. 

Wealso inferred activities for known pathways, by using the PARADIGM 
algorithm to incorporate mutation, copy and mRNA expression data, 
with pathway information catalogued in public databases. This method 
identified a highly significant sub-network of 2,398 known regulatory 
interactions, connecting 1,218 molecular features (645 distinct proteins) 
(Supplementary Figs 46-49 and Supplementary Tables 16 and 17). 
Several ‘active’ transcriptional ‘hubs’ were identified, by searching for 
transcription factors with targets that were inferred to be active in the 
PARADIGM network. The active hubs found included HIF1A/ARNT, 
the transcription factor program activated by VHL mutation, as well as 
MYC/MAX, SP1, FOXM1, JUN and FOS. These hubs, together with 
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d, Heat map showing CpG loci with SETD2 
mutation-associated DNA methylation (from part 
c); blue to red indicates low to high DNA 
methylation. The loci are split into those 
hypomethylated (top panel; n = 1,251) or 
hypermethylated (bottom panel; n = 1,306) in 
SETD2 mutants. Top colour bars indicate SETD2 
mRNA expression (red: high, green: low) and 
SETD2 mutation status. Grey-scale row-side colour 
bar on left-hand side represents the relative number 
of overlapping reads, based on H3K36me3 ChIP-seq 
experiment in normal adult kidney (http:// 
nihroadmap.nih.gov/epigenomics/); black, high 
read count. DNA methylation patterns include 14 
normal kidney samples. Among the tumours 
without SETD2 mutations, six (arrowhead) have 
both the signature pattern of SETD2 mutation and 
low SETD2 mRNA expression. 
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several other less well-studied transcription factors, interlink much of the 
transcriptional program promoting glycolytic shift, de-differentiation 
and growth promotion in ccRCC. 

We next searched for causal regulatory interactions connecting 
ccRCC somatic mutations to these transcriptional hubs, using a bi- 
directional extension to HotNet (‘TieDIE’) and identified a chromatin- 
specific sub-network (Fig. 4a and Supplementary Figs 50-52). TieDIE 
defines a set of transcriptional targets, whose state in the tumour cells is 
proposed to be influenced by one or more of the significantly mutated 
genes. The chromatin modification pathway intersects a wide variety of 
processes, including the regulation of hormone receptors (for example, 
ESR1), RAS signalling via the SRC homologue (SHC1), immune-related 
signalling (for example, NFKB1 and IL6)™, transcriptional output (for 
example, HIF1A, JUN, FOS and SP1), DNA repair (via BAP1) and beta- 
catenin (CTNNB1) and transforming growth factor (TGF)-B (TGFBR2) 
signalling via interactions with a SMARC-PBRM1-ARIDI1A complex. 
The complexity of these interactions reflects the potential for highly pleio- 
tropic effects following primary events in chromatin modification genes. 

The mutations in the chromatin regulators PBRM1, BAP1 and 
SETD2 were differentially associated with altered expression patterns 
of large numbers of genes when compared to samples bearing a 
background of VHL mutation (Supplementary Tables 18-21 and Sup- 
plementary Fig. 53). Each chromatin regulator had a distinct set of 
downstream effects, reflecting diverse roles for chromatin remo- 
delling in the transcriptome. 

Additionally, an unsupervised pathway analysis using the MEMo 
algorithm” identified mutually exclusive patterns of alterations target- 
ing multiple components of the PI(3)K/AKT/MTOR pathway in 28% 
of the tumours (Fig. 4b and Supplementary Table 22). Interestingly, the 
altered gene module included two genes from the broad amplicon on 
5q35.3: GNB2L1 and SQSTM1. Both these genes have previously been 
associated with activation of PI(3)K signalling**””. Furthermore, mRNA 
expression levels of these two genes were correlated with both DNA 
copy number increases and alteration status of the PI(3)K pathway 
(Supplementary Figs 54-55). The mutual exclusivity module also includes 
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Figure 3 | mRNA and miRNA patterns reflect molecular subtypes of 
ccRCC. a, Tumours were separated into four sample groups (that is, “clusters’) 
by unsupervised analyses, on the basis of either differentially expressed mRNA 
patterns (left panel, showing 500 representative genes: m1-m4) or differentially 
expressed miRNA patterns (right panel, showing 26 representative miRNAs: 
mil-mi4). b, Significant differences in patient survival were identified for both 
the mRNA-based clusters (left panel) and the miRNA-based clusters (right 
panel). c, Numbers of samples overlapping between the two sets of clusters, 
with significant concordance observed between m1 and mi3 and between m3 
and mi2; red, significant overlap (P< 10°? chi-squared test). d, mRNA- 
miRNA correlations, for predicted targeting interactions. Rows indicate 
miRNAs from a (indicated by cluster-specific colour bar); columns, mRNAs 
(5,000 differentially regulated genes selected for average RPKM > 10 and at 
least one predicted miRNA interaction); mRNA-miRNA entries with no 
predicted targeting are white. To the right of the correlation matrix, t statistics 
(Spearman’s rank) indicate group target enrichment. 


frequent overexpression of EGFR, which correlates with increased 
phosphorylation of the receptor (Supplementary Fig. 56), and which 
has been previously associated with lapatinib response in ccCRCC”. 


Correlations with survival 
Where unsupervised analyses had indicated that common molecular 
patterns were associated with patient survival, we sought to further 
define molecular prognostic signatures at the levels of mRNA, miRNA, 
DNA methylation and protein. Data were divided into ‘discovery’ 
(n = 193) and ‘validation’ (n = 253) sets and platform-specific signa- 
tures were defined using Cox analyses**. Kaplan-Meier analysis for each 
signature showed statistically significant associations with survival in 
the validation subset (Fig. 5a and Supplementary Fig. 57). Multivariate 
Cox analyses, incorporating established clinical variables, showed that 
the mRNA, miRNA and protein signatures provided additional pro- 
gnostic power (Supplementary Table 23). In addition, these signatures 
could provide molecular clues as to the drivers of aggressive cancers. 
Top protein correlates of worse survival included reduced AMP- 
activated kinase (AMPK) and increased acetyl-CoA carboxylase (ACC) 
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Figure 4 | Genomically-altered pathways in ccRCC. a, Alterations in 
chromatin remodelling genes were predicted to affect a large network of genes 
and pathways (larger implicated network in Supplementary Information). Each 
gene is depicted as a multi-ring circle with various levels of data, plotted such 
that each ‘spoke’ in the ring represents a single patient sample (same sample 
ordering for all genes). ‘PARADIGM’ ring, bioinformatically inferred levels of 
gene activity (red, higher activity); ‘Expression’, mRNA levels relative to normal 
(red, high); ‘Mutation’, somatic event; centre, correlation of gene expression or 
activity to mutation events in chromatin-related genes (red, positive). Protein- 
protein relationships inferred using public resources. b, For the PI(3)K/AKT/ 
MTOR pathway (altered in ~28% of tumours), the MEMo algorithm identified 
a pattern of mutually exclusive gene alterations (somatic mutations, copy 
alterations and aberrant mRNA expression) targeting multiple components, 
including two genes from the recurrent amplicon on 5q35.3. The alteration 
frequency and inferred alteration type (blue for inactivation and red for 
activation) is shown for each gene in the pathway diagram. 


(Supplementary Fig. 58). Together, downregulation of AMPK and 
upregulation of ACC activity contribute to a metabolic shift towards 
increased fatty acid synthesis’. A metabolic shift to an altered use of key 
metabolites and pathways was also apparent when considering the full 
set of genes involved in the core metabolic processes, including a shift 
towards a “Warburg effect’-like state (Fig. 5b). Poor prognosis correlated 
with downregulation of AMPK complex and the Krebs cycle genes, and 
with upregulation of genes involved in the pentose phosphate pathway 
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Figure 5 | Molecular correlates of patient survival involve metabolic 
pathways. a, Sample profiles were separated into discovery and validation 
subsets, with the top survival correlates within the discovery subset being 
defined for each of the four platforms examined (mRNA, microRNA, protein, 
DNA methylation). Kaplan-Meier plots show results of applying the four 
prognostic signatures to the validation subset, comparing survival for patients 
with predicted higher risk (red, top third of signature scores), lower risk (blue, 
bottom third) or intermediate risk (grey, middle third); successful predictions 
were observed in each case. b, When viewed in the context of metabolism, the 
molecular survival correlates highlight a widespread metabolic shift, with 
tumours altering their usage of key pathways and metabolites (red and blue 


(G6PD, PGLS, TALDO (also known as TALDOIP1), TKT) and fatty 
acid synthesis (FASN, ACC (also known as ACACA)). 

Examination of potential genetic or epigenetic drivers of a glycolytic 
shift led us to identify methylation events involving MIR21 and GRB10, 
with decreased promoter methylation of each gene (thereby higher 
expression) being associated with worse or better outcome, respect- 
ively (Fig. 5b, Supplementary Fig. 59 and Supplementary Table 24). 
Both genes regulate the PI(3)K pathway: miR-21 is inducible by high 
glucose levels and downregulates PTEN”; whereas the tumour sup- 
pressor GRB10 negatively regulates PI(3)K and insulin signalling”. 
Promoter methylation of MIR21 and GRB10 were coordinated with 
their mRNA expression patterns, as well as with the mRNA expression 
of other key genes and protein expression in the metabolic pathways 
(Fig. 5c and Supplementary Fig. 60). In addition to the PI(3)K pathway 
(Fig. 5b and Supplementary Fig. 61), molecular survival correlations 
involved several pro-metastatic matrix metalloproteinases (Supplemen- 
tary Fig. 62). 


Discussion 


Our study sampled a single site of the primary tumour, in a disease 
with a potentially high level of tumour heterogeneity*. The extent to 
which convergent evolutionary events are a common theme in ccRCC 
remains to be determined, but may indicate that critical genes will be 
represented across the tumour landscape for an individual mass. In 
general, the large sample size seemed to overcome the intrinsic chal- 
lenges of studying a genetically complex disease, revealing rare variants 
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shading representing the correlation of increased gene expression with worse or 
better survival respectively, univariate Cox based on extended cohort). Worse 
survival correlates with upregulation of pentose phosphate pathway genes 
(G6PH, PGLS, TALDO and TKT), fatty acid synthesis genes (ACC and FASN), 
and PI(3)K pathway enhancing genes (MIR21). Better survival correlates with 
upregulation of AMPK complex genes, multiple Krebs cycle genes and PI(3)K 
pathway inhibitors (PTEN, TSC2). Additionally, specific promoter methylation 
events, including hypermethylation of PI(3)K pathway repressor GRB10, 
associate with outcome. c, Heat map of selected key features from the metabolic 
shift schematic (b) demonstrating coordinate expression by stage at DNA 
methylation, RNA, and protein levels (data from validation subset). 


at rates similar to what has been described previously’. The samples, 
taken from primary tumour specimens, were reflective of patients fit 
for either definitive or cytoreductive nephrectomy, whereas future work 
could explore the genomic landscape of metastatic lesions. 

Pathway and integrated analyses highlighted the importance of 
the well-known VHL/HIF pathway, the newly emerging chromatin 
remodelling/histone methylation pathway, and the PI(3)K/AKT path- 
way. The observation of chromatin modifier genes being frequently 
mutated in ccRCC strongly supports the model of nucleosome dynamics, 
providing a key function in renal tumorigenesis. Although the mech- 
anistic details remain to be defined as to how such modulation promotes 
tumour formation, the data presented here revealed alterations in DNA 
methylation associated with SETD2 mutations. As an epigenetic process 
that can potently modify many transcriptional outputs, these mutational 
events have the potential to change the landscape of the tumour genome 
through altered expression of global sets of genes and genetic elements. 
Molecular correlates of patient survival further implicated PI(3)K/AKT 
as having a role in tumour progression, involving specific DNA methy- 
lation events. The PI(3)K/AKT pathway presents a strong therapeutic 
target in ccRCC, supporting the potential value of MTOR and/or 
related pathway inhibitor drugs for this cancer*!””. 

Cross-platform molecular analyses indicated a correlation between 
worsened prognosis in patients with ccRCC and a metabolic shift 
involving increased dependence on the pentose phosphate shunt, 
decreased AMPK, decreased Krebs cycle activity, increased glutamine 
transport and fatty acid production. These findings are consistent 
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with the isotopomer spectral analysis of a pair of VHL/~ clear cell 
kidney cancer cell lines, both of which were notably derived from 
patients with aggressive, metastatic disease, which revealed a depend- 
ence on reductive glutamine metabolism for lipid biosynthesis*’. The 
metabolic shift identified in poor prognosis ccCRCC remarkably mirrors 
the Warburg metabolic phenotype (increased glycolysis, decreased 
AMPK, glutamine-dependent lipogenesis) identified in type 2 pap- 
illary kidney cancer characterized by mutation of the Krebs cycle 
enzyme, fumarate hydratase**. Further studies to dissect out the role 
of the commonly mutated chromosome 3 chromatin remodelling 
genes, PBRM1, SETD2 and BAP1, in ccRCC tumorigenesis and their 
potential role in the metabolic remodelling associated with progression 
of this disease will hopefully provide the foundation for the develop- 
ment of effective forms of therapy for this disease. 


METHODS SUMMARY 


Specimens were obtained from patients, with appropriate consent from insti- 
tutional review boards. Using a co-isolation protocol, DNA and RNA were purified. 
In total, 446 patients were assayed on at least one molecular profiling platform, 
which platforms included: (1) RNA sequencing, (2) DNA methylation arrays, (3) 
miRNA sequencing, (4) Affymetrix single nucleotide polymorphism (SNP) arrays, 
(5) exome sequencing, and (6) reverse phase protein arrays. As described above 
and in the Supplementary Methods, both single platform analyses and integrated 
cross-platform analyses were performed. 
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o3BPI is a reader of the DNA-damage- 
induced H2A Lys 15 ubiquitin mark 
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53BP1 (also called TP53BP1) is a chromatin-associated factor that promotes immunoglobulin class switching and DNA 
double-strand-break (DSB) repair by non-homologous end joining. To accomplish its function in DNA repair, 53BP1 
accumulates at DSB sites downstream of the RNF168 ubiquitin ligase. How ubiquitin recruits 53BP1 to break sites remains 
unknown as its relocalization involves recognition of histone H4 Lys 20 (H4K20) methylation by its Tudor domain. Here 
we elucidate how vertebrate 53BP1 is recruited to the chromatin that flanks DSB sites. We show that 53BP1 recognizes 
mononucleosomes containing dimethylated H4K20 (H4K20me2) and H2A ubiquitinated on Lys15 (H2AK15ub), the 
latter being a product of RNF168 action on chromatin. 53BP1 binds to nucleosomes minimally as a dimer using its 
previously characterized methyl-lysine- binding Tudor domain and a carboxy-terminal extension, termed the ubiquitination- 
dependent recruitment (UDR) motif, which interacts with the epitope formed by H2AK15ub and its surrounding residues 
on the H2A tail. 53BP1 is therefore a bivalent histone modification reader that recognizes a histone ‘code’ produced by DSB 


signalling. 


DNA double-strand breaks (DSBs) elicit a cascade of protein recruit- 
ment on the chromatin surrounding DNA lesions that regulates DNA 
damage repair and signalling’*. 53BP1 is an important effector of this 
DSB response, as it promotes repair by non-homologous end joining 
(NHEJ)* by opposing DNA end resection’, the initiating step in homo- 
logous recombination. In mice, 53BP1 is necessary for immuno- 
globulin class switching*® and dysfunctional telomere fusions’, two 
processes that rely on NHEJ. Furthermore, 53BP1 deficiency in mice 
leads to a near-complete reversal of the phenotypes associated with loss 
of BRCA1, including tumorigenesis**. 53BP1 must accumulate on the 
chromatin surrounding DSBs to accomplish its functions’. At the 
molecular level, 53BP1 acts as a recruitment platform for RIF1, its 
effector protein during DSB repair by NHEJ’°’. 53BP1 accumulation 
at DSB sites, as monitored by formation of ionizing radiation (IR)- 
induced subnuclear foci, requires the recognition of histone methyla- 
tion, in particular H4K20me2 (ref. 14), by its tandem Tudor domain!* 
(Fig. 1a). However, the formation of 53BP1 foci also requires the RNF168 
E3 ligase’”, raising the question of how a ubiquitin ligase promotes the 
accumulation of a methylated histone binding protein at sites of DNA 
damage. The current models of 53BP1 recruitment to DSB sites pro- 
pose that H4K20 methylation is either induced or becomes available 
for 53BP1 binding after DNA damage’*””. For example, it has been 
proposed that JMJD2A and L3MBTLI, which bind to H4K20me2, are 
removed in a ubiquitination-dependent manner from the chromatin 
surrounding DSB sites, to allow 53BP1 binding’*”’. In aggregate, these 
models indicate that increased accessibility of H4K20me2 at DSB sites 
might be sufficient to trigger 53BP1 recruitment. 


Identification of the 53BP1 UDR 


We reasoned that if the above model was strictly correct, the 53BP1 
orthologue from fission yeast, Crb2, should also form IR-induced foci 
in human cells. Indeed, Crb2 contains a tandem Tudor domain that 
binds to H4K20me?2 (Fig. 1a)’*. Crb2 accumulates at DSB sites in an 
H4K20me2-binding-dependent manner’®”°, but fission yeast does 


not have a recognizable RNF168 homologue, as it arose later during 
evolution. When expressed in human cells as a GFP fusion, Crb2 failed 
to form IR-induced foci whereas 53BP1 formed foci that co-localized 
with y-H2AX (Fig. 1b). As expected, the accumulation of 53BP1 at DSB 
sites was dependent on H4K20me2 recognition because the 53BP1 
D1521R mutation, which disrupts this activity of the Tudor domain, 
impaired the ability of 53BP1 to form IR-induced foci (Fig. 1b). The 
inability of Crb2 to accumulate at DSB sites in human cells was not due 
to a failure of Crb2 to interact with human H4K20me?, as it associated 
with human chromatin in a Tudor-dependent manner, as determined 
by fluorescence recovery after photobleaching (FRAP) (Supplementary 
Fig. 2a—d) and cellular subfractionation (Supplementary Fig. 2e). These 
experiments suggested that 53BP1 recruitment to break sites might be 
largely independent of an increased accessibility of H4K20me2 in 
damaged chromatin. 

These observations provided an opportunity to map the region that 
endows 53BP1 with the ability to accumulate at DSB sites in an RNF168- 
dependent manner. We refer to this putative region as the ubiquitination- 
dependent recruitment (UDR) motif: We thus prepared various chimaeras 
between Crb2 and the minimal focus-forming region (FFR) of 53BP1, 
which consists of the Tudor domain flanked by an amino-terminal 
oligomerization region and a C-terminal extension”'”* (that is, 53BP1 
residues 1220-1711; Fig. la). We separated the 53BP1(FFR) and Crb2 
into three regions that were swapped between the two proteins, in vari- 
ous combinations. The chimaeras prepared are illustrated in Fig. 1c and, 
to facilitate the identification of the chimaeras, segments were labelled 
‘S’ if derived from 53BP1 and ‘C if derived from Crb2. Because 53BP1 
can oligomerize”’, all experiments were carried out in cells depleted of 
endogenous 53BP1. 

The domain-swapping experiments first confirmed that the Crb2 
Tudor domain can recognize H4K20me2 in human chromatin, as the 
Crb2 Tudor domain inserted into the 53BP1(FFR) supported local- 
ization to break sites (Fig. 1c, d). Second, introduction of the sequence 
immediately C-terminal of the 53BP1 Tudor domain into Crb2 (CC5 
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Figure 1 | Identification of the 53BP1 UDR. a, Schematic representation of 
53BP1 and Crb2. b, U2OS cells transfected with GFP-53BP1 and GFP—Crb2 
expression vectors were irradiated (10 Gy) and processed for GFP imaging and 
y-H2AX immunofluorescence (mean + s.e.m., n = 4 except for 53BP1 WT, 
where n = 5). c, 53BP1-depleted U20S cells transfected with the indicated 
GFP-53BP1/Crb2-derived expression vectors were irradiated (10 Gy) and 
processed as described in a (mean + s.e.m., n = 5 except for 5CC and C5C, 
where n = 3). d, Analysis of GFP fusion protein expression by immunoblotting. 
The migration of molecular mass markers (kDa) is indicated on the left. EV, 
empty vector; IB, immunoblot. e, 53BP1-depleted U2OS cells transfected with 
vectors expressing the indicated GFP-53BP1 mutants (residues 1220-1631) 
were irradiated (10 Gy) and processed as in a (mean + s.e.m., n = 3 except for 
WT, where n = 7, K1613A and L1619A, where n = 4, and $1631A, where 
n= 6). f, Alignment of the UDR region in 53BP1 orthologues. Arrowheads 
highlight key UDR residues. g, Relative levels of class switching to IgG1 in 
53BP1 ‘~ murine B cells transduced with the indicated retroviruses 

(mean + s.d., n = 3 except for L1619A, where n = 2). 


chimaera) produced a protein that accumulated into IR-induced foci 
that co-localized with y-H2AX (Fig. 1c, d). Notably, the accumulation 
of the CC5 chimaera at DSB sites was dependent on RNF168 (Sup- 
plementary Fig. 3a, b), strongly suggesting that sequences C-terminal 
of the 53BP1 Tudor compose the UDR. We further narrowed down 
the UDR to the region between residues 1604 and 1631 (Supplemen- 
tary Fig. 3c-e). 

Next, we performed alanine-scanning mutagenesis of the UDR, in 
the context of 53BP1(1220-1631), to identify residues that participate 
in the recruitment of 53BP1 to DNA damage sites. These studies iden- 
tified five residues (11617, L1619, N1621, L1622 and R1627) the muta- 
tion of which to alanine disrupts 53BP1 recruitment to DSB sites, with 
the 11617A and L1619A mutations having the strongest impact (Fig. le 
and Supplementary Fig. 4). The importance of L1619 and L1622 in 53BP1 
recruitment to break sites was observed previously”'. Introduction of 
these five mutations in the context of full-length 53BP1 also impaired 
IR-induced focus formation, confirming their importance for accumu- 
lation at DSB sites (Supplementary Fig. 5). The five residues important 
for the activity of the UDR are clustered in a 12-amino-acid residue 
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segment that is highly conserved among 53BP1 orthologues in organ- 
isms that have a recognizable RNF8 pathway (Fig. 1f). 


The UDR is required for 53BP1 function 


RIF1 is the 53BP1 effector during DSB repair'®’. We therefore examined 
the contribution of the UDR in promoting RIF1 IR-induced focus 
formation. We tested eight UDR mutations introduced in a short 
interfering RNA (siRNA)-resistant 53BP1 vector: the five mutations 
that affect 53BP1 recruitment and three others (K1613A, D1616A and 
E1624A) that do not. We also included in these assays 53BP1 and 
53BP1(D1521R), our positive and negative controls, respectively. We 
observed that the mutations that impaired 53BP1 accumulation at DSB 
sites also abrogated RIF1 foci after IR (Supplementary Fig. 5). These 
results indicated that the UDR is critical for the function of 53BP1 in 
the DSB response. In further support of this observation, reconstitu- 
tion of 53BP1~'~ murine B cells with either the D1521R or L1619A 
mutants failed to restore class switch recombination (CSR) from IgM 
to IgGl, whereas reintroduction of wild-type 53BP1 restored CSR 
(Fig. 1g and Supplementary Fig. 6). Furthermore, the UDR-defective 
L1619A mutant was unable to restore resistance to IR-induced DSBs 
in DT40 53BP1~‘~ cells, or to restrict homologous recombination in 
BRCA1 and 53BP1 co-depleted cells (Supplementary Fig. 7). Together, 
these results indicate that the UDR is necessary for the biological 
functions of 53BP1. 


53BP1 binds to ubiquitinated nucleosomes 


Next, we sought to determine the mechanism by which the UDR 
promotes 53BP1 recruitment to DSB sites. We first considered that 
the UDR might increase the affinity of 53BP1 for H4K20me2 due to its 
location; that is, apposed to the Tudor domain. We expressed GST- 
53BP1 fusion proteins consisting of the tandem Tudor domain with 
(Tudor-UDR) or without (Tudor) the UDR region and examined 
binding to a H4K20me2-derived peptide in pull-down assays. We 
observed that both proteins interacted equally well with H4K20me2 
in a manner that required the D1521 residue (Fig. 2a). The L1619A 
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Figure 2 | 53BP1 binds to ubiquitinated nucleosomes. a, Streptavidin pull 
downs of the indicated GST fusion proteins with a biotinylated H4K20me2 
peptide. IB, immunoblot. b, Chromatin from HEK293 cells expressing Flag- 
RNF168 (+) or not (—) were subjected to pull-down assays with the indicated 
GST fusion proteins. c, Ubiquitination of the indicated NCPs by RNF168 and 
BMI1-RINGIB. NS, nonspecific band. d, Pull-down assays of NCPs containing 
H4K¢20me? with the indicated GST fusion proteins. NCPs were ubiquitinated 
with RNF168 as the E3 (+); a reaction without El (—) was used as a negative 
control. The migration of molecular mass markers (kDa) is indicated on the 
left. 
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mutation, which abolishes UDR activity, had no effect on H4K20me2 
binding (Fig. 2a), indicating that the UDR does not have an impact on 
recognition of H4K20mez2, at least in the context of a peptide. 

An alternative function of the UDR might be that it promotes the 
interaction of 53BP1 with chromatin. To test this possibility, we pre- 
pared polynucleosome-enriched extracts obtained from micrococcal 
nuclease digestion of human chromatin. Because RNF168 overexpres- 
sion can trigger 53BP1 accumulation on chromatin”, even in the absence 
of RNF8 (ref. 24), we prepared a set of extracts from cells that either 
overexpressed RNF168 (ref. 25), or that were transfected with an empty 
vector. RNF168 was recently shown to catalyse a new histone mark, 
H2AK13/K15 monoubiquitination (H2AK13/K15ub)”°”, and thus we 
sought to test whether 53BP1 could potentially bind to nucleosomes 
containing RNF168-ubiquitinated H2A. Immunoblotting of H2A 
showed that the global levels of H2A ubiquitination (ub-H2A) did 
not greatly change after RNF168 overexpression (Fig. 2b) because 
H2AK119ub, which is catalysed by E3s such as BMI1-RING1B*, is 
much more abundant than H2AK13ub or K15ub. These extracts were 
then subjected to GST pull-down assays using either the 53BP1 Tudor 
domain or the extended Tudor-UDR module. In the absence of exo- 
genous RNF168 expression, we observed a UDR-dependent interaction 
between 53BP1 and histones H2A, H3 and H4 (Fig. 2b). However, in 
the presence of RNF168, we observed a marked increase in the retrieval 
of mono- and diubiquitinated H2A by the Tudor-UDR protein 
(Fig. 2b). Together, these results indicated that the UDR may stimulate 
two modes of interaction between 53BP1 and nucleosomes: one mode 
that is independent of histone ubiquitination, and which may reflect 
the constitutive interaction of 53BP1 with chromatin; and a second 
mode of interaction that is dependent on H2A ubiquitination by RNF168, 
and which may represent the interaction that leads to 53BP1 accumu- 
lation at DSB sites. 


53BPI1 recognizes H2AK15ub 


Because the above experiments were carried out with polynucleosomes, 
the interactions observed could be the product of avidity between the 
dimeric 53BP1 fusion protein and the multimeric nucleosomal arrays. 
Therefore, we tested whether we could detect binding between 53BP1 
and fully recombinant monomeric nucleosome core particles (NCPs) 
(Supplementary Fig. 8a). We used an N-terminal fragment of RNF168 
in ubiquitination reactions with UbcH5, which recapitulated H2AK13/ 
K15 ubiquitination”® (Fig. 2c and Supplementary Fig. 8b-d). To generate 
H2AK119ub, we used a recombinant BMI1-RINGIB complex as the 
E3 (Fig. 2c and Supplementary Fig. 8b, c). For histone methylation, we 
produced methyl-lysine analogue versions of H4K20me2 (H4K¢20me2) 
and H3K9me2 (H3K-9me2)” before octamer and nucleosome assembly. 

We first assembled H4K-20me2-containing NCPs and subjected 
them to ubiquitination reactions in the presence of RNF168 to produce 
H2AK13/K15ub (Fig. 2d). As a control, we also carried out reactions 
without El and, as expected, no H2A ubiquitination was detected 
(Fig. 2d). The products of these two reactions were used in GST-pull-down 


GST Tudor-UDR 


assays with various fusion proteins derived from 53BP1. We observed a 
marked, ubiquitination-dependent interaction between the 53BP1 
Tudor-UDR fusion and NCPs (Fig. 2d) that was not seen with the 
GST protein alone, the 53BP1 Tudor domain alone, the D1521R 
mutant or, finally, the L1619A mutant that disrupts UDR function. 
Together, these data indicate that the 53BP1 Tudor-UDR module 
promotes binding to methylated and ubiquitinated mononucleosomes 
in a manner that involves the same residues that are necessary for 
53BP1 accumulation at DSB sites. 

Next, we examined whether the binding of 53BP1 to NCPs was 
specific to H4K20me2 and H2AK13/K15ub. We assembled a series of 
NCPs that contained H4K-20me2, H3K-9mez2 or their unmodified 
lysine counterparts. These NCPs were then used in ubiquitination 
reactions with RNF168 to produce H2AK13/K15ub-containing NCPs. 
When these NCPs were interrogated for binding to the 53BP1 Tudor- 
UDR module, we observed a specific interaction between 53BP1 and 
the NCPs containing H4K-20me2 (and H2AK13/K15ub) but not 
those containing H3K-9me2 or the unmodified H4K20 (Fig. 3a and 
Supplementary Fig. 9a). Next, we used H4K-20me2-containing NCPs 
as substrates in ubiquitination reactions with RNF168 or BMI1- 
RING1B. Both reactions produced similar levels of monoubiquitinated 
H2A (Fig. 3b) but when they were used in GST pull-down assays with 
the 53BP1 Tudor-UDR module, we only observed an interaction with 
the RNF168-ubiquitinated NCPs. Importantly, we excluded the pos- 
sibility that the binding was due to the presence of diubiquitinated H2A 
in the RNF168 sample (Supplementary Fig. 9b). From these results, 
we conclude that the interaction between 53BP1 and nucleosomes 
requires the presence of both H4K20me2 and H2AK13/K15ub. 

Whereas RNF168 can ubiquitinate H2A K13 or K15 in vitro and 
in vivo’*’’, we tested whether 53BP1 displayed selectivity towards 
K13ub or K15ub. To do so, we assembled H4K-20me2-containing 
NCPs with either H2AK13R or H2AKI5R substitutions to leave K15 
or K13, respectively, as the only residue ubiquitinated by RNF168 
(Fig. 3c). To our surprise, when these ubiquitinated NCPs were used 
in pull-down assays, we found that the 53BP1 Tudor-UDR protein 
interacted specifically with NCPs containing H2AK15ub (Fig. 3c). 
This result indicates that 53BP1 has the ability to discriminate between 
two closely positioned ubiquitinated lysine residues on H2A. 


Molecular basis of H2AK15ub selectivity 

One possible cause for the 53BP1 selectivity towards H2K15ub could be the 
presence of sequence elements in the H2A N-terminal tail that are recog- 
nized by 53BP1. We noted that three mutations (K9R, A14S and R17S) 
were sufficient to convert the sequence surrounding K13 in the H2AK15R 
mutant into the sequence that normally surrounds H2AK15 (Fig. 4a). We 
found that the resulting mutant (H2AK15Rm3), when ubiquitinated, 
bound robustly to the 53BP1 Tudor-UDR module (Fig. 4a). We con- 
clude that additional residues in the H2A N terminus contribute to the 
binding of 53BP1 to H2AK15ub-containing nucleosomes. 
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Figure 3 | 53BP1 is a bivalent reader of the H4K20me2 and H2AK15ub 
histone marks. a, Pull-down assays of RNF168-ubiquitinated NCPs 
containing unmethylated histone H4 and H3 (no me), H4Kc20me2 or 
H3K_9me2 with GST-Tudor-UDR. IB, immunoblot. b, Pull-down assays of 
NCPs ubiquitinated with the indicated E3s by GST-Tudor-UDR. A reaction 
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H4K,20me2 NCPs 


H4K,20me2 NCPs 


without El (No E1) acts as a negative control. c, GST-Tudor-UDR pull-down 
assays of the indicated NCPs ubiquitinated with RNF168 (+); a reaction 
lacking E1 (—) was used as negative control. The migration of molecular mass 
markers (kDa) is indicated on the left. 
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of the H2A N termini of various mutants. Mutations are highlighted in colour. 
Bottom panel: GST-Tudor-UDR pull-down assays of RNF168-ubiquitinated 
NCPs (+) assembled with H4Kc¢20me2 and the indicated H2A mutants. A 
reaction without E1 (—) was used as control. IB, immunoblot. b, GST-Tudor- 
UDR pull-down assays of NPCs with the indicated H2A ubiquitinations. 

c, Pull-down assays of RNF168-ubiquitinated H4K-20me2 NCPs with the 
indicated fusion proteins. A white arrowhead highlights faint ub—-H2A bands. 
d, GST-UDR pull-down assays of unmethylated NCPs that were either 
ubiquitinated with the indicated E3 or not (no E1). The migration of molecular 
mass markers (kDa) is indicated on the left. 


We next investigated whether ubiquitin recognition contributed to 
the 53BP1-NCP interaction. Ubiquitin contains a hydrophobic patch 
centred on its 144 residue that contributes to most ubiquitin-dependent 
interactions*’, and therefore we sought to test whether the ubiquitin 
144 residue was important for 53BP1 recognition of H2AK15ub-NCP 
complexes. We used chemical ubiquitination by disulphide exchange”" 
to prepare NCPs that contained H2A chemically ubiquitinated on 
H2AK13 (H2AKcl3ub), H2AK15 (H2AKcl5ub) and H2AK15 ubi- 
quitinated with Ub(144A) (H2AK_15ub(144A); Fig. 4b). Those NCPs 
were then used in pull-down assays with the Tudor-UDR module. As 
expected, we found that the interaction was selective for H2AKc15ub 
(Fig. 4b). However, H2AK-15ub(144A)-containing NCPs were unable 
to be retrieved by 53BP1 (Fig. 4b), suggesting that ubiquitin recog- 
nition participates in the interaction of 53BP1 with H2AK15ub. 

In aggregate, our results support a model where the Tudor-UDR 
module comprises two histone-modification-binding domains: the 
Tudor domain that binds H4K20me2, and the UDR, which may interact 
directly with H2AK15ub. In support of this model, the transfer of the 
53BP1 UDR onto the Crb2 Tudor domain endowed it with the ability 
to robustly bind to RNF168-ubiquitinated and H4K20me2-containing 
NCPs (Supplementary Fig. 10a—d). This observation prompted us to 
produce the isolated UDR, as a GST fusion protein, and to assess its 
binding properties. As expected, the isolated UDR did not bind to 
H4K20me2 (Supplementary Fig. 10e). However, when H2AK13/ 
K15ub-containing NCPs were incubated with increasing amounts of 
the isolated UDR, we observed a dose-dependent retrieval of NCPs ina 
manner that required the critical L1619 residue (Fig. 4c). This weak 
binding of the UDR to NCPs was specific to RNF168-dependent ubi- 
quitination (Fig. 4d) and, as expected, was independent of H4K20me2 
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given the absence of the Tudor domain (Fig. 4d and Supplementary 
Fig. 11a). Interestingly, we failed to detect either an interaction between 
the UDR and free ubiquitin by nuclear magnetic resonance or an 
interaction between the UDR and a ubiquitinated H2A peptide in a 
pull-down experiment (Supplementary Fig. 11b, c). These results indi- 
cate that the UDR recognizes H2AK15ub specifically in the context of 
the nucleosome. 


Discussion 


In response to DSBs, ATM signalling triggers a first wave of chromatin 
ubiquitination that is dependent on RNF8 (ref. 2). The role of RNF8, 
and its E2 UBC13, is to trigger the recruitment of RNF168 to DSB sites 
where it catalyses H2AK13/K15 monoubiquitination. Together, our 
work identifies 53BP1 as a bivalent histone modification reader that 
recognizes nucleosomes modified with H4K20me2 and the DNA- 
damage-inducible H2AK15ub mark (Supplementary Fig. 1). We propose 
that the engagement of H4K20me2 by the Tudor domain positions 
the UDR in the correct orientation to contact the epitope formed by 
H2AK15ub, a scenario supported by modelling (Supplementary Fig. 12). 
In the Supplementary Data section, we also present evidence that 53BP1 
recognizes nucleosomes minimally as a dimer (Supplementary Figs 13 
and 14). Together, these observations indicate that 53BP1 may alter 
nucleosomal array structure either by acting similarly to a wheel-clamp 
if a dimer engages a mononucleosome (Supplementary Fig. 1), or by 
acting as a ubiquitination-dependent nucleosome crosslinker if it can 
bridge adjacent nucleosomes (Supplementary Fig. 1). These plausible 
binding modes may be central to the function of 53BP1 as an inhibitor 
of end resection. 

Our experiments also identify the first, to our knowledge, site-specific 
reader of histone ubiquitination. 53BP1 is likely to be one of many 
readers that interpret the various histone ubiquitination marks iden- 
tified so far. Proteins such as ASH2L (for H2BK120ub)”, RNF168 and 
RNF169 (for H2AK13/K15ub)** are prime candidates for ubiquitin 
mark readers. RNF169 presents an attractive case because it acts as a 
competitive inhibitor of 53BP1 (refs 25, 33). These observations and 
the identification of 53BP1 as an H2AK15ub reader further emphasize 
the need to decipher the chromatin modification landscape, its regu- 
lation and its interpretation, at sites of DNA damage. 


METHODS SUMMARY 


Human cell lines were maintained at 37 °C and 5% CO, atmosphere whereas the 
avian DT40 cells were grown at 39.5°C and 5% CO atmosphere. Immuno- 
fluorescence microscopy and fluorescent protein imaging were carried out as 
described previously''’°. Recombinant protein production and pull-down assays 
were carried out as described previously*>™*. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Cell culture and plasmid transfection. Human cell culture media were supple- 
mented with 10% fetal bovine serum (FBS) and maintained at 37 °C and 5% CO; 
atmosphere. U-2-OS (U20S) cells were cultured in McCoy’s medium (Gibco). 
HEK293T and HeLa DR-GFP cells were cultured in DMEM (Gibco). HCT116 
Flp-In T-REx Flag and Flag~RNF168 stably transfected cell lines were cultured in 
DMEM supplemented with 250 pg ml” ' hygromycin B and 5 pg ml blasticidin. 
CT116 Flp-In T-REx Flag and Flag~RNF168 stably transfected cell lines were 
described previously”, U2OS and HEK293T cells were purchased from ATCC 
and HeLa DR-GFP cells were a gift from the laboratory of R. Greenberg. All cell 
lines were tested negative for mycoplasma contamination. To induce protein 
expression in these cell lines, 5 1g ml’ doxycycline was added to the culture 
medium for 24h. DT40 cells were obtained from the laboratory of D. Xu and 
grown at 39.5 °C, 5% CO, in RPMI 1640 medium (Gibco) supplemented with 
10% fetal calf serum, 1% chicken serum and 0.1 mM f-mercaptoethanol. Plasmid 
transfections were generally carried out using Lipofectamine 2000 Transfection 
Reagent (Invitrogen) or Effectene (Qiagen). 

Unless stated otherwise, for microscopy experiments, cells were fixed 1h after 
irradiation (10 Gy). The DNA was also counter-stained with DAPI (not shown) 
and used to trace the outline of the nuclei. 

Retroviral restitution of 53BP1 in B cells. Class switching to IgG1 was assayed in 
53BP1~'~ murine primary B cells complemented with 53BP1 (1-1711) and 
mutants thereof by retroviral delivery. Mature B lymphocytes were isolated from 
the spleens of two males and one female 8-15-week-old 53BP1_‘~ C57BL/6 strain 
129-Trp53bp1tm1Jc/J mice’ by depletion of CD43* cells using CD43 microbeads 
(Miltenyi Biotech) according to the manufacturer’s instructions. Mice were 
obtained from Jackson laboratories. Purified B cells were re-suspended at a con- 
centration of 10° cells ml” in the presence of 50ng ml’ IL-4 (Preprotech) and 
251g ml * LPS (Sigma-Aldrich) or 1 pg ml! agonist anti-CD40 (BD) to allow 
B-cell proliferation/activation. Retroviral particles were collected from the super- 
natant of Plat-E packaging cells** transfected with 10 1g of the different retroviral 
pMxX constructs. Retroviral supernatants were passed through a 0.45 tm filter and 
then ultracentrifuged at 20,000g at 25 °C for 90 min through a 20% sucrose layer to 
obtain purified virus. B cells were subsequently infected with the retroviral con- 
centrate in the presence of 8 pg ml ' polybrene (Sigma-Aldrich) and 20 mM HEPES, 
pH7.5 by plate centrifugation. The B-cell medium was subsequently changed and 
replaced with fresh RPMI medium supplemented with 50 ng ml’ IL-4 and 25 pg 
ml * LPS or 1 jg ml! agonist anti-CD40 to induce class switching to IgG1. CSR 
was analysed 3 days after infection by flow cytometry as described previously’’. 
Experiments with 53BP1‘~ mice (Trp53bp 1°") were carried out according 
to regulatory standards and were approved by the Mount Sinai Hospital animal 
care committee (Protocol AUP 0200a). 

Chromatin pull down. HEK293 chromatin-enriched extracts were prepared 
essentially as described’’. Chromatin pull downs were performed with 2.5 pg of 
recombinant GST-tagged proteins immobilized on glutathione sepharose 4B (GE 
Healthcare) in chromatin pull-down buffer (CPB: 50mM Tris-HCl, pH 7.5, 
100 mM NaCl, 1 mM dithiothreitol (DTT)) for 1h at 4°C. Pull downs were then 
carried out by mixing 125 yg of chromatin-enriched extract isolated from cells 
stably expressing Flag~-RNF168 in a final volume of 1.5 ml for 3h at 4°C. Pull 
downs were then washed four times with 1 ml of CPB and eluted in 2x Laemmli 
SDS-PAGE sample buffer for analysis by immunoblotting. 

NCP reconstitution. Recombinant histones were purified essentially as described**”. 
Briefly, after preparation of inclusion bodies, the histones were purified under dena- 
turing conditions on either a HiPrep 16/60 Sephacryl S-300 HR (GE Healthcare) 
size exclusion column or a 5 ml HiTrap SP HP (GE Healthcare) cation exchange 
column. Fractions containing the purified histones were pooled and extensively 
dialysed into water and 2mM f-mercaptoethanol before lyophilization. His6- 
G76C ubiquitin and His6-144A/G76C ubiquitin to be installed on H2A K13C or 
K15C were purified over a Ni-NTA column (Qiagen) and the His tag was removed 
with TEV protease. DTT was added to the ubiquitin to reduce any oxidized bonds 
and then quickly buffer exchanged into degassed water (with no DTT) using a PD- 
10 column (GE Healthcare). The reduced ubiquitin was immediately snap frozen 
and lyophilized to dryness. Octamers were refolded by mixing the four histones in 
equimolar ratios, followed by dialysis into 2M NaCl, and then purified on a 
Superdex 200 10/300 GL size exclusion column (GE Healthcare). Nucleosome 
core particles (NCPs) were reconstituted as described’’. The 151-base pair DNA 
used to wrap the mononucleosomes was obtained from an EcoRV digest of 32x601 
DNA plasmid (a gift from C. Arrowsmith). 

Histone labelling. The installation of a dimethyl-lysine analogue (di- MLA) at the 
mutated cysteine of the H4K20C protein or H3K9C (C110A) protein was done 
exactly as described previously’. The di-MLA installation was confirmed by mass 
spectrometry for the H3K-9me?2 protein, and by mass spectrometry and immu- 
noblotting against H4K20me2 for the H4K-20me2 full-length protein. Once 
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labelled, the H4K-20me2 and H3K-9mez2 proteins were refolded into octamers 
as described in the ‘NCP reconstitution’ section above. 

Installation of a modified wild-type or 144A mutant ubiquitin was achieved 
by disulphide-directed conjugation. H2A K15C or K13C were conjugated with 
G76C ubiquitin or 144A/G76C ubiquitin via disulphide exchange before octamer 
refolding. Specifically, the cysteine on the histone was activated by dissolving 
5 mg of lyophilized H2A in 1 ml of water containing 5mM TCEP. Then 5mg 
of DTNP (2,2'-dithiobis(5-nitropyridine); Sigma Aldrich) dissolved in 2 ml of acetic 
acid was added to the histone and this mixture was agitated at room temperature 
overnight. The activated histone reaction was then dialysed extensively against 
water before purification by S75 10/300 gel filtration (GE Healthcare) in conjuga- 
tion reaction buffer. The conjugation reaction was set up in 6 M guanidinium-HCl, 
50 mM Tris-HCl pH 6.9 at room temperature with a 2:1 ratio of lyophilized ubi- 
quitin to degassed activated histone. After 1h, the completion of the reaction was 
confirmed by mass spectrometry. Unconjugated ubiquitin and any oxidized ubi- 
quitin-ubiquitin conjugates were removed in subsequent gel filtration runs. 

In vitro ubiquitination of the NCP. Nucleosomes were ubiquitinated by incub- 
ating 2.5 ug recombinant mononucleosomes with 30nM E1, 1.5 4M UbcH5a, 
4 uM RNF168 (1-113) or BMI1-RING1B complex, 22 1M ubiquitin or methy- 
lated ubiquitin (Boston Biochem), and 3.33 mM ATP ina buffer containing 50 mM 
Tris-HCl, pH 7.5, 100 mM NaCl, 10 mM MgCl, 1 uM ZnOAc and 1 mM DTT at 
30°C for 2h. 

NCP pull-down assays. NCP pull downs were done in a total volume of 100 il by 
using 15-20 pl ubiquitination reaction and 4 or 8 tg GST- or MBP-protein coupled 
to glutathione sepharose 4B (GE Healthcare) or amylose resin (New England 
Biolabs), respectively, in the same buffer as the peptide pull downs, except contain- 
ing 0.1% BSA. Pull-down reactions were incubated for 2 h at 4 °C. Pull downs were 
then washed three times with 0.5 ml of the pull-down buffer plus 0.1% BSA and 
eluted in 2X Laemmli SDS-PAGE sample buffer for analysis by immunoblotting. 
Plasmids. The GFP-53BP1 expression vector (DDp1910) resistant to siRNA 
53BP1 no. 1 (ThermoFisher D-003548-01) was described previously''. The GFP 
was swapped for mCherry using the Kpnl-Ascl sites to generate pcDNA5- 
mCherry-FRT/TO-53BP1 (DDp2005). The 53BP1 deletion vectors (consisting 
of residues 1220-1711, 1220-1631, 1484-1603, 1484-1631 or 1604-1631) were 
created by inserting PCR-amplified fragments (derived from DDp1910) into the 
NotI and Apal sites of pcDNA5-GFP-FRT/TO; EcoRI and NotI sites of pcDNA5- 
Flag-FRT/TO-DmrA (DDp1911) and pcDNA5-HA-FRT/TO-DmrC (DDp1912); 
and in the BamHI and EcoRI sites of a modified pETM-30-02 vector in which the 
ORE of GST was inserted between the hexahistidine tag and the TEV cleavage 
site or between the BamHI and PstI sites of pMAL-c2X (New England Biolabs). 
Mammalian expression vectors for the components of the heterodimerization 
system were generated by PCR amplification of DmrA (FKBP12) and DmrC 
(FRB) from pLVX-Het-2 and pLVX-Het-1 (iDimerize Inducible Heterodimer 
System, Clonetech), respectively, and by ligation into the BamHI and EcoRI sites 
of pcDNA5-Flag-FRT/TO and pcDNA5-HA-FRT/TO (DDp1915). The GFP- 
Crb2 and GFP-CC (that is, residues 1-507 of Crb2) vectors (DDp1913 and 
DDp1914, respectively) were constructed by inserting PCR-amplified fragments 
of Crb2 into the NotI and Apal sites of pcDNA5-GFP-FRT/TO or pcDNA5-GFP- 
NLS-FRT/TO (DDp1916). The source of the Crb2 coding sequence was the 
pJK148-Crb2 plasmid (gift from L.-L. Du)**. Chimaeras of 53BP1 and Crb2 were 
obtained by annealing overlapping PCR fragments (where 555 = amino acids 
1220-1483, 1484-1603, 1604-1711; and CCC = amino acids 1-357, 358-507, 
507-708). The annealed fragments were then ligated into the NotI and Apal sites 
of pcDNA5-GFP-NLS-FRT/TO (DDp1916). The GST-Crb2 Tudor domain (that 
is, residues 358-507 of Crb2) alone or fused to 53BP1 UDR (residues 1604-1631), 
yielding GST-Tudor(C) and GST-Tudor(C)-UDR(5) vectors, respectively, were 
constructed by inserting PCR-amplified sequences into the BamHI and EcoRI sites 
of a modified pETM-30-02 described above. Bacterial expression vectors for his- 
tones (Hisg-human H2A in pET15b, Hisg-human H2B in pET15b, Xenopus laevis 
H3 in pET3d and X. laevis H4 in pET3a) were obtained from C. Arrowsmith. The 
RNF168 bacterial expression vector (residues 1-113; DDp1878) was obtained by 
PCR amplification of the DDp1109 (ref. 25) and cloned into pPROEX HTa 
(Invitrogen) using the BamHI and Spel sites. The BM11-Hisg (residues 1-108) 
bacterial expression vector (DDp1886) was obtained by PCR amplification of 
pGEX-4T1-BMI1 (1-108) and cloned into pET24b(+) using NdeI and Xhol sites. 
The RINGIB (residue 1-116) bacterial expression vector (DDp1887) was obtained 
by PCRamplification of pET28-MHL-RINGb(1-120) and cloned into pGEX-6P-1 
using the BamHI and Not! sites. pGEX-4T1-BMI1(1-108) and pET28-MHL- 
RINGB(1-120) were gifts of Y. Tong. The retroviral vector pMX-53BP1 (1-1711) 
and its D1521R derivative were gifts of A. Nussenzweig’. All mutations were 
introduced by site-directed mutagenesis using QuikChange (Stratagene) and all 
plasmids were sequence-verified. 
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Subcellular fractionation. A cytoplasmic fraction (CYTO) was obtained by col- 
lecting HEK293 cells in EBC1 buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 
0.5% IGEPAL CA-630, 1 mM EDTA, 1 mM DTT, 1 protease inhibitors— 
Complete, EDTA-free; Roche). After centrifugation at 1,000g for 15 min at 4°C, 
the nuclear pellet was re-suspended and periodically vortexed in EBC2 buffer (50 
mM Tris-HCl pH 7.5, 300 mM NaCl, 5 mM CaCl, 1X protease inhibitors— 
Complete, EDTA-free; Roche) over 30 min. Following centrifugation at 1,000g 
for 15 min, the supernatant was harvested as the nuclear soluble fraction (NS). 
The remaining insoluble chromatin fraction was then solubilized by micrococcal 
nuclease digestion for 30 min at 30 °C and centrifugation at 1,000g for 15 min; the 
supernatant was collected as the nuclear soluble fraction (CHR). 

RNA interference. All siRNAs used in this study were single duplex siRNAs 
purchased from ThermoFisher. RNA interference (RNAi) transfections (40 nM) 
were performed using DharmaFECT 1 (ThermoFisher) or RNAiMax (Invitrogen) 
in a forward transfection mode, following the manufacturer’s protocol. The indivi- 
dual siRNA duplexes used were: 53BP1 (ThermoFisher, D-003548-01, target sequence: 
5'-GAGAGCAGAUGAUCCUUUA-3’), RNF168 (ThermoFisher, D-007152-04, 
target sequence: 5'-GAAGAGUCGUGCCUACUGAUU-3’), BRCA1 (ThermoFisher 
D-003461-05, target sequence: 5'-CAGCUACCCUUCCAUCAUAUU-3’) and 
non-targeting control siRNA (ThermoFisher, D-001210-02, target sequence: 5’- 
UAAGGCUAUGAAGAGAUAC-3’). Except when stated otherwise, siRNAs were 
transfected 48 h before cell processing. 

Antibodies. We used the following antibodies: mouse anti-53BP1 (clone 19, BD 
Biosciences), rabbit anti-53BP1 (A300-273A, Bethyl), mouse anti-y-H2AX (clone 
JBW301, Millipore), rabbit anti-y-H2AX (no. 2577, Cell Signaling Technologies), 
rabbit anti-BRCA1 (no. 07-434, Millipore), rabbit anti- KAP1 (A300-274A, Bethyl), 
goat anti-RIF1 (N20) (sc55979, Santa Cruz), mouse anti-Flag (clone M2, Sigma), 
mouse anti-tubulin (clone DM1A, Calbiochem), mouse anti-GFP (Roche), rabbit 
anti-cyclin A (gift from M. Pagano), mouse anti-HA (F-7, sc-7392, Santa Cruz), rabbit 
anti-H2A (ab18255, Abcam), rabbit anti-H4 (NBP1-19404, Novus Biologicals), 
rabbit anti-H3 (ab1791, Abcam), rabbit anti-H2B (ab1790, Abcam), rabbit anti- 
H4K20me2 (9759, Cell Signaling Technologies), rabbit anti-GST (sc-459, Santa 
Cruz), mouse anti- MBP (E8032, NEB), rabbit anti-ubiquitin (Z0458, Dako) and 
mouse anti-actin (CP01, Calbiochem). Peroxidase-affiniPure goat anti-rabbit IgG 
(111 035 144, Jackson ImmunoResearch) and HRP-linked sheep anti-mouse IgG 
(NA931, GE Healthcare) were used as secondary antibodies in immunoblotting. 
The following antibodies were used as secondary antibodies in immunofluores- 
cence microscopy: Alexa Fluor 488 goat anti-mouse, Alexa Fluor 488 donkey anti- 
mouse, Alexa Fluor 555 goat anti-mouse, Alexa Fluor 555 goat anti-rabbit, Alexa 
Fluor 555 donkey anti-goat, Alexa Fluor 647 donkey anti-mouse (Molecular Probes). 
Fluorescence recovery after photobleaching (FRAP). For FRAP experiments, 
cells were seeded onto 25-mm round coverslips, transferred to a Chamlide Chamber 
and imaged using a Quorum WaveFX Spinning Disc Confocal System (Quorum 
Technologies) equipped with a X63 oil objective and a temperature-controlled 
chamber (37°C, 5% CO,). All images were acquired using Volocity Software 
(Improvision). FRAP experiments were performed 24h after cell transfection. 
Five images were acquired before photobleaching a region of interest using a 
Photonic Instruments Mosaic (450-515nm Ar laser, 0.6s) to achieve at least 
60-70% of measured fluorescence loss. Images were then acquired every 0.08 s 
for 20s. Image processing was performed using Volocity and included photo- 
bleaching and background correction. Recovery time was obtained by fitting a 
single exponential equation. As the image set of each sample was acquired with 
nonuniform time intervals, a cubic spline interpolation technique was used to 
resample data on a common time base. 

Recombinant protein production. GST and MBP fusions proteins were pro- 
duced as previously described****. Briefly, MBP and GST proteins expressed in 


Escherichia coli were purified on amylose (New England Biolabs) or glutathione 
sepharose 4B (GE Healthcare) resins according to the batch method described by 
the manufacturer and stored in 50 mM HEPES pH 7.5, 150mM NaCl, 5% gly- 
cerol. Hiss-UbcH5a and Hisg-RNF168 (1-113) were purified on Ni-NTA agarose 
(Qiagen) and stored in 50 mM Tris-HCl pH 7.5, 1 mM EDTA, 10% glycerol. For 
NMR, the proteins were further purified by gel filtration in NMR buffer (S75 10/ 
300 HiPrep Superdex, GE Healthcare). The pET24b(+)-BMI1 and pGEX-6p-1- 
RING1B expression plasmids were co-transformed and the proteins were purified 
as a complex as described”. 

Peptides. The H4K20me2 peptide (H4K20me2: biotin- YGKGGAKRHR-K(me2)- 
VLRD) was purchased from BioBasic and the H2AK15ub peptide (biotin-spacer- 
ARAKAK(Ub)SRSSR; Spacer = 8-amino-3,6-dioxaoctanoic acid) was purchased 
from Lifesensors Inc. 

Peptide pull downs were performed by incubating 2.5 1M MBP or GST-tagged 
53BP1 proteins with 25 1M of the indicated biotinylated histone H4-derived 
peptide in peptide pull-down buffer (PPB) (50mM Tris-HCl pH 8.0, 150 mM 
NaCl, 0.05% NP-40, 1% BSA). After 2h at 4°C, 10 ul of the pull-down reaction 
mixture was removed as input control and 10 kl of streptavidin- Dynabeads (Dynal) 
were added to the pull-down mixture and incubated for an additional 30 min at 
4 °C. The Dynabeads were then washed twice with 750 ul PPB, twice with 750 pl of 
50 mM Tris-HCl pH 8.0, 150 mM NaCl and were then eluted in 25 pil X2 Laemmli 
SDS-PAGE sample buffer for analysis by immunoblotting. 

Gel filtration. Estimation of the GST-53BP1 Tudor-UDR and MBP-53BP1 
Tudor-UDR molecular masses in solution was done by gel filtration analysis 
using a 24-ml $200 Superdex column (S200 10-300 GL, (GE Healthcare)) in 
50mM HEPES pH7.5, 150mM NaCl. Approximately 400 pg of purified GST 
or MBP-tagged protein was injected onto the column. The molecular mass of each 
sample was estimated according to the elution profile of gel filtration standard 
molecular weight markers (151-1901, Bio-Rad). 

NMR spectroscopy. NMR data were acquired at 25°C on a 600 MHz Bruker 
AVANCE III spectrometer equipped with a 1.7mm TCI CryoProbe. Two- 
dimensional 'H,'°N HSQC (heteronuclear single quantum coherence) spectra 
were collected for 0.2 mM '°N-ubiquitin in the absence or presence of GST, GST- 
53BP1-UDR (1604-1631) or CDC34. All NMR samples were prepared in 50 mM 
HEPES, pH 7.5, 100 mM NaCl, 1mM DTT and 10% D0. Backbone resonance 
assignments for human ubiquitin have been reported previously*°. Residue A46 
was not seen in the spectra, and therefore was excluded from analysis. 

Mass spectrometry. Electrospray ionization mass spectrometry analysis was 
performed on an Agilent LC/MSD TOF mass spectrometer. Samples were diluted 
in 0.1% trifluoroacetic acid before analysis. Deconvolution was performed using 
Agilent MassHunter workstation software for the analysis of modified histones. 
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Most stars and their planets form in open clusters. Over 95 per cent 
of such clusters have stellar densities too low (less than a hundred 
stars per cubic parsec) to withstand internal and external dynamical 
stresses and fall apart within a few hundred million years’. Older 
open clusters have survived by virtue of being richer and denser in 
stars (1,000 to 10,000 per cubic parsec) when they formed. Such 
clusters represent a stellar environment very different from the birth- 
place of the Sun and other planet-hosting field stars. So far more 
than 800 planets have been found around Sun-like stars in the field’. 
The field planets are usually the size of Neptune or smaller**. In 
contrast, only four planets have been found orbiting stars in open 
clusters®*, all with masses similar to or greater than that of Jupiter. 
Here we report observations of the transits of two Sun-like stars by 
planets smaller than Neptune in the billion-year-old open cluster 
NGC6811. This demonstrates that small planets can form and survive 
in a dense cluster environment, and implies that the frequency and 
properties of planets in open clusters are consistent with those of 
planets around field stars in the Galaxy. 

Previous planet surveys in clusters have suffered from insufficient 
sensitivity to detect small planets, and from sample sizes barely large 
enough to find the less common larger planets’. The recent discovery 
by the Doppler method of two giant planets around Sun-like stars in 
the Praesepe open cluster® set a preliminary lower limit to the rate of 
occurrence of hot Jupiters in that cluster. This frequency is not incon- 
sistent with that in the field, after accounting for the enriched metallicity 
of Praesepe’’ and the positive correlation between stellar metallicity and 
the frequency of giant planets''. However, it does not address the fre- 
quency of smaller planets such as those more commonly found around 
field stars. NASA’s Kepler telescope is sensitive enough to detect planets 
of the size of Neptune or smaller, using the transit technique. 

Our detection of two mini-Neptunes (two to four Earth radii, Rg) in 
NGC6811 is the result of a survey of 377 stars in the cluster as part of 
The Kepler Cluster Study’. The two planets, Kepler-66b and Kepler- 
67b, have radii of 2.8Rq@ and 2.9Rq@ and are each transiting (passing in 
front of) a Sun-like star in NGC6811 once every 17.8 and 15.7 days, 
respectively. Kepler-66b and Kepler-67b are the smallest planets to be 
found in a star cluster, and the first cluster planets seen to transit their 
host stars, which enables the measurement of their sizes. 

The properties derived for the two planets depend directly on the 
properties determined for their parent stars (Kepler-66 and Kepler-67). 
Because the members of NGC6811 form a coeval, co-spatial and chemi- 
cally homogeneous collection of stars, they trace a distinct sequence in 
the colour-magnitude diagram (Fig. 1a). This allows both their commonly 
held properties (such as age and distance) and their individual physical 
characteristics (such as masses, radii and temperatures) to be determined 
reliably from stellar evolution models’*"*. Kepler-66b and Kepler-67b 
therefore join a small group of planets with precisely determined ages, 


distances and sizes. Table 1 lists the model-derived properties of the two 
planets and their host stars. Figure 1a shows the locations of Kepler-66 
and Kepler-67 in the colour-magnitude diagram for NGC6811, and 
Fig. 2 displays their phase-folded transit light curves reduced and cali- 
brated by the Kepler pipeline’. 

The membership of Kepler-66 and Kepler-67 to NGC6811 was 
established from a five-year radial-velocity survey (see Supplemen- 
tary Information). They are both secure radial-velocity members of 
NGC6811 and are located squarely on the cluster sequence in the 
colour-magnitude diagram (Fig. la). Their rotation periods listed in 
Table 1 were determined from the periodic, out-of-transit, brightness 
variations in the Kepler light curves, caused by star spots being carried 
around as the star spins (see Supplementary Information). The rotation 
periods provide additional confirmation of cluster membership, as 
they obey the distinct relationship between stellar rotation and colour 
observed for other members of NGC6811. Figure 1b shows the colour 
versus rotation period diagram plotted for radial-velocity members of 
the cluster’®. 

Because of the large distance to NGC6811, the two host stars are too 
faint (see Table 1) for their radial velocities to be measured with suffi- 
cient precision to confirm the status of Kepler-66b and Kepler-67b as 
true planets in the usual way, that is, by establishing that their masses 
are in the planetary range. To validate them as planets we instead 
applied a statistical procedure known as BLENDER (see Supplemen- 
tary Information), by which we have demonstrated that they are much 
more likely to be planets than false positives. We determined proba- 
bilities of only 0.0019 and 0.0024 that Kepler-66b and Kepler-67b are 
false positives. 

To establish whether finding two mini-Neptunes in NGC6811 is 
consistent with the rate of occurrence of planets in the field, we con- 
ducted a Monte Carlo experiment using the known spectral type and 
magnitude distributions of the 377 member stars. We simulated true 
planets adopting distributions of planet sizes and orbital periods cor- 
responding to those found in the Kepler field, along with planet occur- 
rence rates based on a statistical study of the Kepler candidates that 
accounts for the incidence of false positives as well as incompleteness’. 
We retained only the simulated planets that would be detectable by 
Kepler on the basis of real noise estimates for each star. We repeated 
the simulation 1,000 times to predict the average number of transiting 
planets of all sizes we would expect to detect among the known cluster 
members observed by Kepler, as well as their period and size distribu- 
tions (Fig. 3). The result, 4.0 + 2.0 planets, is consistent with our two 
planet detections. The expected number of 2.2 + 1.5 mini-Neptunes is 
also consistent with our detection of two such planets, and the lack of 
smaller and larger transiting planets in NGC6811 similarly agrees with 
their predicted detection rates of 1.2 + 1.1 for Earths and super-Earths 
(0.8-2Rq) and 0.6 + 0.6 for giant planets (>4Rq). Together, the results 
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Figure 1 | The colour-magnitude and colour-period diagrams for 
NGC6811. a, The colour-magnitude diagram for stars within a 1-degree- 
diameter field centred on NGC6811 with the locations of Kepler-66 and Kepler- 
67 marked by black circles. Cluster members, marked with larger red dots, trace 
a well-defined relationship between stellar mass (colour, B — V) and luminosity 
(brightness, V) that can be fitted by stellar evolution models to determine the 
age and distance of NGC6811 as well as the masses and radii of its members. By 
this method NGC6811 is found to be 1.00 + 0.17 billion years old and 


imply that the planet frequency in NGC6811 is consistent with that of 
the field. 

The members of NGC6811 fall entirely within the range of stellar 
spectral types selected for the Kepler planet survey, and the slightly sub- 
solar metallicity of NGC6811 (ref. 17) is close to the average metallicity 
of the Galactic disk population from which the Kepler targets are drawn. 
Therefore, correlations between planet frequency and stellar mass and/ 


Table 1| Stellar and planetary parameters for Kepler-66 and 
Kepler-67 


Stellar properties Kepler-66 Kepler-67 
Right ascension 19h35min 55.573s 19h36min 36.799s 
Declination 46° 41' 15.906” 46° 09’ 59.181” 
Spectral type GOV GOV 
Effective temperature, Tor (K) 5,962+79 5,331 +63 
log[Surface gravity (cms °)] 4.484 + 0.023 4.594 + 0.022 
Rotation period (days) 9.97+0.16 10.61 + 0.04 
Mass (solar masses) 1.038 + 0.044 0.865 + 0.034 
Radius (solar radii) 0.966 + 0.042 0.778 + 0.031 
Density (solar) 1.152015 1.89'20.17 
Visual magnitude, V 15.3 16.4 
Age (billion years) 1.00 +0.17 
Distance (parsec) 1,107 +90 
Metallicity, Z 0.012 + 0.003 
Planetary parameters Kepler-66b Kepler-67b 
Orbital period (days) 17.815815 + 0.000075 15.72590 + 0.00011 
Impact parameter 0.56 + 0.26 0.37 =0.21 


Time of mid-transit 2454967.4854 + 0.0025 2454966.9855 + 0.0048 


(BJD) 
Planet-to-star radius 0.02646 + 0.00097 0.03451 + 0.0013 
ratio 
Scaled semi-major axis 30.3 1.0 324+ 1.1 
(a/Rstar) 
Semi-major axis (Av) 0.1352 + 0.0017 0.1171 +0.0015 
Radius (Re) 2.80 +0.16 2.94+0.16 


The age, distance and chemical composition of NGC6811 were determined from a maximum- 
likelihood fit of stellar evolution models'*4 to the cluster sequence in the colour-magnitude diagram 
using Bayesian statistics and a Markov-chain Monte Carlo algorithm’. The best-fitting stellar 
isochrone** and photometric measurements in all available bandpasses (UBV, griz, JHK and D51 
magnitude) were used to derive the effective temperatures, surface gravities, masses, radii and 
densities for Kepler-66 and Kepler-67. The transit and orbital parameters (period, impact parameter, 
time of mid-transit, radius ratio, and scaled semi-major axis) for Kepler-66b and Kepler-67b were 
derived from the Kepler photometry using a Markov-chain Monte Carlo procedure with the mean stellar 
density as a prior?®. The parameters for Kepler-67b account for minor dilution from a close companion 
to the star described in section 3.2 of the Supplementary Information. Errors given for stellar and 
planetary parameters are lo uncertainties. BJD is barycentric Julian date, and au is astronomical units. 
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1,107 + 90 parsecs distant’. b, The colour-period diagram for 72 NGC6811 
members’®. The rotation periods are determined from periodic brightness 
variations in the Kepler light curves, and the error bars represent the dispersion 
of multiple period measurements. As in the colour-magnitude diagram, cluster 
members trace a well-defined relation between stellar colour and rotation 
period. The locations of Kepler-66 and Kepler-67 on the cluster sequence are 
marked by orange star symbols. 


or metallicity are not a concern when comparing the frequency and size 
distribution of planets in NGC6811 to that of the field. The detection of 
Kepler-66b and Kepler-67b thus places the first robust constraint on 
the frequency of small planets in open clusters relative to the field. 

The comparison in Fig. 3 of the orbital periods and radii of Kepler- 
66b and Kepler-67b with those in our simulated distributions shows 
that the sizes and orbital properties of the two planets are similar to 
those of the most common types of field planets (2-3Rq, and orbital 
periods between 10 and 20 days). This suggests that the sizes and 
orbital properties of planets in open clusters are also not unlike those 
in the field. 

The masses, structures and compositions of Kepler-66b and Kepler- 
67b can be constrained using theoretical models. With radii in excess 
of 2Rq, the two planets probably contain significant quantities of 
volatiles in the form of astrophysical ices and up to a few per cent of 
H or He by mass. Volatile-poor rocky planets this large would have 
Saturn-like masses of 82-117 Earth masses (assuming an Earth-like 
composition with 32% iron core and 68% silicates by mass), and would 
be larger and more massive than any rocky exoplanet discovered to 
date. Instead, Kepler-66b and Kepler-67b are likely to have structures 
and compositions that resemble that of Neptune and, following mass— 
radius relations for exoplanets in the field’, probably have masses less 
than 20 Earth masses (see Supplementary Information). 

For NGC6811 to have survived a billion years, the initial number 
density of stars in the cluster must have been at least that of the Orion 
Trapezium cluster (about 13,000 per cubic parsec) and thus more than 
two orders of magnitude greater than that of the typical cluster formed 
in a molecular cloud (about a hundred stars per cubic parsec; ref. 1). 
Highly energetic phenomena including explosions, outflows and winds 
often associated with massive stars would have been common in the 
young cluster. The degree to which the formation and evolution of 
planets is influenced by a such a dense and dynamically and radiatively 
hostile environment is not well understood, either observationally or 
theoretically” *°. The formation of planets takes place in the circum- 
stellar disks during the first few million years of a star’s life, which is the 
typical lifetime of disks*®. We estimated the number and mass-distri- 
bution of stars in NGC6811 at the time Kepler-66b and Kepler-67b 
formed by fitting a canonical initial mass function” to the current 
distribution of masses for members in the cluster (see Supplemen- 
tary Information). The calculation suggests that the cluster contained 
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Figure 2 | Transit light curves. a, b, The Kepler light curves for Kepler-66 
(a) and Kepler-67 (b). The photometric measurements (grey points) were 
acquired in long cadence mode (30-min total exposures) and have been 
detrended”*, normalized to the out-of-transit flux level, and phase-folded on the 


at least 6,000 stars during the era of planet formation, including several 
O stars (masses greater than 20 solar masses) and more than one 
hundred B stars (masses between 3 and 20 solar masses). The discovery 
of two mini-Neptunes in NGC6811 thus provides evidence that the 
formation and long-term stability of small planets is robust against 
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Figure 3 | Distribution of planetary properties. a, b, Histograms of planetary 
radii (a) and orbital periods (b) of simulated transiting planets expected in 
NGC6811, accounting for incompleteness and assuming the same period and 
size distribution and occurrence rate as in the field’. The properties of 
Kepler-66b and Kepler-67b are similar to those of the most commonly expected 
planets. The widths of the red and blue vertical lines reflect +1 errors in the 
radii and periods of the two planets. 


periods of the transiting planets. The blue data points and error bars represent 
the same data phase-binned in 30-min intervals and the standard error of the 
mean, respectively. Transit models smoothed to the same cadence are 
overplotted in red. 


stellar densities that are extremely high for open clusters, and the violent 
deaths and high-energy radiation of nearby massive stars. 
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Global resurfacing of Mercury 4.0-4.1 billion years 
ago by heavy bombardment and volcanism 


Simone Marchi", Clark R. Chapman’, Caleb I. Fassett?, James W. Head‘, W. F. Bottke? & Robert G. Strom? 


The most heavily cratered terrains on Mercury have been estimated 
to be about 4 billion years (Gyr) old’, but this was based on images 
of only about 45 per cent of the surface; even older regions could 
have existed in the unobserved portion. These terrains have a lower 
density of craters less than 100 km in diameter than does the Moon'*”, 
an observation attributed to preferential resurfacing on Mercury. 
Here we report global crater statistics of Mercury’s most heavily 
cratered terrains on the entire surface. Applying a recent model 
for early lunar crater chronology® and an updated dynamical extra- 
polation to Mercury’, we find that the oldest surfaces were emplaced 
just after the start of the Late Heavy Bombardment (LHB) about 
4.0-4.1 Gyr ago. Mercury’s global record of large impact basins*, 
which has hitherto not been dated, yields a similar surface age. This 
agreement implies that resurfacing was global and was due to vol- 
canism, as previously suggested’. This activity ended during the tail 
of the LHB, within about 300-400 million years after the emplace- 
ment of the oldest terrains on Mercury. These findings suggest that 
persistent volcanism could have been aided by the surge of basin- 
scale impacts during this bombardment. 

The earliest geological features that have been detected on Mercury, 
the heavily cratered terrains, show signs of ancient resurfacing as 
shown by the intercrater smooth plains. Early work, based on partial 
coverage by Mariner 10 images, suggested that both volcanism’ and 
basin ejecta’ could have been responsible for the formation of the inter- 
crater plains. Recent work'® based. on high-resolution imaging from 
MESSENGER (MErcury Surface, Space ENvironment, GEochemistry, 
and Ranging) presented evidence that the extensive intercrater plains 
seen in the heavily cratered terrains resulted from an early period of 
volcanism, although clear volcanic sources for these ancient units have 
not yet been identified. The timing and areal extent of this proposed 
resurfacing on Mercury, and the specific role of volcanism, have been 
unknown. 

As is true for other terrestrial bodies, except the Earth and the Moon, 
Mercury’s relative geological chronology has been inferred from obser- 
vations of the impact crater record (see, for example, ref. 11), with 
absolute ages then extrapolated from the better-constrained lunar 
crater chronology'*"*. Here we measure crater size-frequency distri- 
butions for the most heavily cratered terrains on Mercury to determine 
their absolute ages. The age of the most heavily cratered terrains is an 
important benchmark for Mercury, because it provides an upper limit 
for the formation of subsequent major geological units such as the 
widespread volcanic smooth plains in the annulus surrounding the 
Caloris basin’ and in high northern latitudes of Mercury'’®. The cur- 
rently visible impactor population in the terrestrial planet region, 
namely near-Earth objects, is now well characterized for kilometre- 
sized asteroids'’, although impact rates and size distributions are less 
certain for earlier epochs. By using current models of the impact rate in 
the inner solar system, a model production function for lunar cratering 
has been developed and extrapolated to Mercury’. More recently, an 
independent model’* found comparable results. 


We initially identified the most heavily cratered terrains on Mercury 
using a preliminary global crater catalogue’®, then defined their boun- 
daries on anew MESSENGER global mosaic. We concentrated on two 
regions of high crater density (Supplementary Fig. 1): the northern 
heavily cratered terrains (NHCT; Fig. 1) and a heavily cratered area at 
southern latitudes east of Rembrandt basin unseen by Mariner 10 
(Supplementary Fig. 1). The NHCT is a surviving remnant of a once 
larger heavily cratered terrain. Adjacent regions experienced more 
substantial resurfacing by the northern volcanic plains, the circum- 
Caloris volcanic plains, and by young basins east of the NHCT. The 
region east of Rembrandt was studied in a similar manner and pro- 
duced comparable results (Supplementary Fig. 2), so we restrict dis- 
cussion in this paper to the NHCT region. 

The next step was to use a model production function of craters to 
model the observed cumulative number of craters at least 25 km in 
diameter on NHCT (see Supplementary Fig. 2). The model production 
function was obtained’ by using an impactor size-frequency distri- 
bution resembling that of the main asteroid belt”’””°, which has pro- 
vided a suitable fit to old units on Mercury, the Moon and Mars. As can 
be seen, the model production function fits the NHCT data quite well, 
ensuring that our model is well suited to studying the early cratering 
on Mercury. 

Cratering data for the northern heavily cratered terrains are plotted 
in both cumulative and R-plot formats” in Fig. 2. The associated lunar 
data come from crater counts on specific ancient lunar terrains”. The 
pre-Nectarian terrains were defined as a particular portion of the nor- 
thern farside highlands. They represent one of the oldest lunar terrains, 
with a crater spatial density that slightly exceeds that of the NHCT on 
Mercury. The post-Nectarian crater size-frequency distribution is repre- 
sentative of terrains coeval with or younger than the Nectaris basin, a 
stratigraphic benchmark in lunar history”. 

In general, we find that the spatial density of craters from the lunar 
and Mercurian terrains in Fig. 2b approaches empirical saturation 
equilibrium, which is thought to occur at R = 0.2-0.3, for diameters 
near 100 km (ref. 23), but they fall well below this level for craters that 
are considerably larger or smaller. The shapes of the NHCT and lunar 
pre-Nectarian terrains crater size-frequency distributions also resemble 
that of the lunar nearside highlands crater size-frequency distribution'”®. 
The question is whether these ancient units have reached crater satu- 
ration or whether they still represent the size-frequency distribution 
of the impactor population. The characteristics of the crater size- 
frequency distributions on the NHCT and pre-Nectarian terrains lead 
us to adopt the view that those terrains are in production (see Sup- 
plementary Information for discussion). 

To interpret the ancient crater size-frequency distribution on Mer- 
cury within the context of lunar chronology, we need to account for the 
differences between Mercury and the Moon concerning impact velo- 
cities, gravitational focusing and other factors that affect crater scaling 
relationships. Using the current Moon-crossing and Mercury-crossing 
asteroid populations derived from ref. 24, we find that on average 
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Figure 1 | The northern heavily cratered terrains of Mercury. Crater 
measurements were made from a global mosaic with a resolution of 500 m per 
pixel based on MESSENGER images obtained during its first year orbiting 
Mercury. Our reliance on the global mosaic for actual crater measurements was 
augmented, for a few regions of poorer imaging and for some highly degraded 
craters, by a global digital terrain model having a resolution of about 1,300 m 
per pixel produced from wide-angle camera images by R. Gaskell (personal 
communication, 2012). a, Crater areal density (in number of craters at least 
25 km in diameter per 10° km’) obtained by averaging over a radius of 300 km. 


about 3-3.5-fold as many craters in the size range relevant for this 
work (20-300 km) should form on Mercury as on the Moon (Sup- 
plementary Fig. 3). In this work we adopt a conservative factor of 3 
(valid at 20 km), a value consistent with an independent estimate’’. 
Furthermore, we adopted a recently revised early lunar chronology’ for 
ages older than 3 Gyr (Fig. 3). The earliest declining lunar bombard- 
ment was due to planetesimals left over from terrestrial planet formation. 
Beginning about 4.1 Gyr ago there was a spike in the bombardment rate 
(the LHB) due to asteroids ejected from the primordial asteroid belt by 
sweeping resonances in the wake of late giant planet migration”*, which 
declined over at least the subsequent 0.6 Gyr. This is manifested in the 
cumulative plot by the break in slope at 4.1 Gyr ago. We also plot in Fig. 3 
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Figure 2 | Comparison of Mercury cratering data with key lunar units. The 
measured craters on the NHCT (blue triangles) are plotted in a cumulative 
form (a) and on an R-plot (b), obtained by normalizing the cumulative 
size-frequency distributions to a power law D ”, where Dis the crater diameter. 
Pre-Nectarian terrains (red diamonds) encompass a portion of the lunar 
northern farside*”’’. The post-Nectarian crater size-frequency distribution 
(green circles) was obtained by taking the crater size-frequency distribution 
(for D<300km) found near or on terrains resurfaced by the formation of 
Nectaris basin” and then adding 12 post-Nectaris basins”, all of which had 
D> 300km. Error bars correspond to Poisson statistics. 
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Smaller craters were not included in the analysis because they can be heavily 
affected by secondary cratering and erasure (see, for example, ref. 15). The black 
line defines the northern heavily cratered terrains used in this paper. Although 
there are small regions within the outlined region of somewhat higher crater 
density, these may be statistical fluctuations and in any case would be such 
small counting areas that the statistics would be poor. b, Measured craters at 
least 25 km diameter (pink circles) overlaid on the digital terrain model. Basins 
at least 300 km in diameter are indicated by yellow circles. Only one such basin 
has been incorporated into the crater size-frequency distribution. 


the expected cumulative cratering flux for Mercury, appropriately 
scaled by the factor discussed above. The results show that the NHCT 
has an age of about 4.0-4.1 Gyr and therefore is likely to be several 
hundreds of million years younger than the most ancient lunar terrains 
(interpreted in ref. 6 to be about 4.4 Gyr old). Even if both the lunar and 
Mercurian heavily cratered terrains were in an empirical saturation 
equilibrium state, the age difference between the Moon and Mercury 
would be reduced but Mercury’s NHCT crater retention age would still 
be post-Nectarian. 


10° 


! 
i) 
! 
[ 
! 
! ei 
wl 
x) | 
@1 4 
a! 
io +! 
c c! 
17) 2} 
§ 1 
s Ae 
3) a) 
en: A | 
2 a 
otitis tii 
3.5 4.0 4.5 
Time (Gyr ago) 


Figure 3 | Mercury and lunar crater chronologies. The solid black curve 
shows the number of lunar craters larger than 20 km per unit surface (N29), 
corresponding to the best model®. The slope transition at 4.1 Gyr ago marks the 
onset of the LHB*. The inferred age of the pre-Nectarian terrains is also shown, 
as well as a putative age for the Nectaris basin of 4.1 Gyr (refs 6, 20, 25). The 
black hatched region represents the envelope of uncertainties in the lunar 
chronology, as discussed in ref. 6. The Mercury crater chronology (red curve) is 
obtained by scaling the lunar chronology by a factor of 3. The model 
uncertainty in the factor of 3 is not considered because it lies within the 
chronology envelope. The horizontal green lines indicate the range of No 
estimated for the NHCT (see Supplementary Fig. 2), which translates into a 
range of ages spanning from about 4.0 to about 4.1 Gyr ago. The horizontal blue 
lines indicate the range of Noo for the northern smooth plains (NSP; see 
Supplementary Fig. 2)'°, which translates into a range of ages spanning from 
about 3.55 to about 3.8 Gyr ago. 
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We can arrive at a similar conclusion by looking at the record of 
large basins on the Moon and Mercury (that is, more than 300 km in 
diameter)*. On the Moon there are about 12-15 such basins younger 
than Nectaris’*”®. If the scaling factor derived above is applicable to 
large LHB-era impactors, which is plausible (see, for example, ref. 25), 
we would expect about 36-45 basins on Mercury to have accumulated 
over the same timescale. This number is close to the 46 + 7 basins 
(certain and probable) observed on Mercury’, and is consistent with 
our predicted younger age for Mercury’s surface. Moreover, the merged 
large basin size—frequency distribution and NHCT crater size-frequency 
distribution match remarkably well the model production function 
over nearly two orders of magnitude in crater sizes (see Supplementary 
Fig. 4). This strongly suggests that the entire surface of Mercury was 
resurfaced 4.0-4.1 Gyr ago and that the most ancient crater record 
(including all visible basins) was produced by impactors having a main 
belt-like size-frequency distribution. 

The end of widespread smooth plains volcanism (see, for example, 
ref. 16) represents another benchmark in Mercury’s history. Using 
our model production function chronology, we find that the northern 
smooth volcanic plains’*, which along with the contemporary plains 
surrounding Caloris basin account for about 17% of the entire surface, 
were emplaced about 3.55-3.8 Gyr ago (Fig. 3 and Supplementary Fig. 2). 

These findings provide compelling evidence for a widespread pro- 
cess, probably volcanism, that erased up to hundreds of millions of 
years of Mercury’s earlier crater history. Moreover, the fact that the 
globally distributed large basins and the NHCT yield similar ages 
suggests that the resurfacing was global in nature (see, for example, 
ref. 10). Our data further indicate that widespread volcanism declined 
rapidly during the LHB relative to the Moon”, and ended about 3.55- 
3.8 Gyr ago. After that time, volcanism was much more restricted, 
occurring only in small patches or within large impact basins”. 

Widespread volcanism on Mercury was occurring at the same time 
as the increase in the impact flux at the start of the LHB period. From 
an impact statistics point of view, the onset of the LHB was probably 
followed by a slight delay before the first large basin-forming colli- 
sions took place. The fact that our age estimate for Mercury’s NHCT is 
slightly younger than the start of the LHB is consistent with heavy 
bombardment and basin formation occurring at the same time as 
global volcanism. Also significant is the cessation of major volcanism 
near the end of LHB basin formation, thus showing a temporal link 
between impact flux and volcanism. These findings, coupled with the 
prediction of a relatively thin lithosphere of Mercury”’, support the 
idea that large impacts may have triggered voluminous volcanism**”’. 
Vital remaining issues are to what extent and in what ways the impact 
process had a role in internal melt generation, ascent and eruption. 
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Quantum fluctuations of the electromagnetic vacuum are responsible 
for physical effects such as the Casimir force and the radiative decay 
of atoms, and set fundamental limits on the sensitivity of measure- 
ments. Entanglement between photons can produce correlations that 
result in a reduction of these fluctuations below the ordinary vacuum 
level, allowing measurements that surpass the standard quantum 
limit in sensitivity’*. The effects of such ‘squeezed states’ of light 
on matter were first considered in a prediction’ of the radiative decay 
rates of atoms in squeezed vacuum. Despite efforts to demonstrate 
such effects in experiments with natural atoms’°, a direct quantita- 
tive observation of this prediction has remained elusive. Here we 
report a twofold reduction of the transverse radiative decay rate of 
a superconducting artificial atom coupled to continuum squeezed 
vacuum. The artificial atom is effectively a two-level system formed 
by the strong interaction between a superconducting circuit and a 
microwave-frequency cavity. A Josephson parametric amplifier is 
used to generate quadrature-squeezed electromagnetic vacuum. 
The observed twofold reduction in the decay rate of the atom allows 
the transverse coherence time, T>, to exceed the ordinary vacuum 
decay limit, 2T,. We demonstrate that the measured radiative decay 
dynamics can be used to reconstruct the Wigner distribution of the 
itinerant squeezed state. Our results confirm a canonical prediction’ of 
quantum optics and should enable new studies of the quantum light- 
matter interaction. 

The quantization of the electromagnetic field implies a minimum 
uncertainty relation for non-commuting observables such as photon 
number and phase, or the two quadrature amplitudes, X, and X3, of a 
mode of the electromagnetic field. The electromagnetic vacuum is a 
minimum-uncertainty state with quantum fluctuations distributed 
equally between the two quadratures. Parametric amplifiers operating 
in the optical'®’* and microwave’*” domain have been used to produce 
squeezed states of the electromagnetic field, wherein the fluctuations in 
one quadrature are increased and fluctuations in the other quadrature 
are reduced below the ordinary vacuum level, allowing for an improve- 
ment in measurement sensitivity’> . The focus of our research, however, 
is to reveal the effects of squeezed vacuum on the radiative properties of 
an atom. In the optical domain, only a few experiments have explored the 
squeezed light-atom interaction, in studies in free space’* and in a cavity 
quantum electrodynamics architecture’. Our experiment is in the micro- 
wave domain and uses a hybrid quantum bit (qubit)—an effective two- 
level atom formed by the strong light-matter dipole interaction between 
a superconducting circuit and a microwave-frequency cavity. We use 
a Josephson parametric amplifier to produce a spectrally broadband 
squeezed vacuum in the modes of a transmission line that are resonant 
with the atomic transition. The architecture of a one-dimensional radi- 
ative environment*””°*? and the strong coupling available in circuit 
quantum electrodynamics” enable us to engineer the radiative decay 
to be solely into the modes of the transmission line that are occupied by 
squeezed vacuum. Thus, we are able to explore the radiative decay 
dynamics of an atom in squeezed vacuum systematically. 


The itinerant electromagnetic field generated by a degenerate paramet- 
ric amplifier may be approximately described in terms of the squeezing 
moments, M and N, which are related to the frequency correla- 
tions of the output field. In the limit of large amplifier bandwidth 
(a'(a)a(w')) = Nd(@ — ') and (a(w)a(w')) = Md(w + «' — 20), 
where a(c’) and a'(q) are respectively the creation and annihilation 
operators of the output field of the amplifier at frequencies w’ and a, 
@p is the centre frequency of the amplifier, 6(x) is the Dirac delta function 
and angle brackets denote expectation values. Squeezed states occur when 
N<M<,/N(N-+1). The radiative decay dynamics of an atom that 
couples to broadband squeezed vacuum centred at the atomic transition 
frequency is governed by the optical Gardiner-Bloch equations*: 


(6x) = —Y(N-M +1/2)(ax) 
(ay) = —y(N+M-+1/2){oy) (1) 
(Gz) = —y(2N + 1) (az) +Y 


Here o,, gy and a, are the pseudospin operators for a two-level atom, 
is the radiative linewidth of the atom and a dot denotes a time deriv- 
ative. As shown in Fig. 1, the X, quadrature of the electromagnetic 
vacuum is squeezed, and a coherent field along this axis induces rota- 
tions of the atom about the yaxis of the Bloch sphere. By setting 
M = N= 0in equation (1), we recover the case of radiative decay into 
ordinary electromagnetic vacuum, where the transverse coherence 
decays half as fast as the longitudinal coherence (T, = 2T; = 2/y). In 
contrast, the radiative decay into squeezed vacuum is characterized by 
the timescales T, =T,/(N—M-+1/2), se =T,/(N+M-+1/2) and 
T, = T,/(2N + 1), which respectively describe the radiative decay of 
coherence when the qubit is prepared along the +x, +y or +Z axes of 
the Bloch sphere. In the limit of large squeezing, it is predicted® that yD 


and T, are reduced and that the transverse decay time, Tes is increased 
beyond the value of 2T, owing to the reduced fluctuations in the X2 
quadrature of the vacuum. 

A simplified schematic diagram of our experiment is shown in 
Fig. 1b. We realized an effective two-level system using the hybrid 
qubit formed by a superconducting transmon” circuit resonantly 
coupled to the transverse electric TE;9; mode of a three-dimensional 
(3D) superconducting cavity”. The transition frequency of the effec- 
tive qubit was w,/2n = 5.8989 GHz with a measured longitudinal 
decay time constant of T; = 0.65 + 0.02 us (mean + s.e.m.), set by 
deliberate coupling to the 50-Q environment. In Supplementary 
Information, we show in detail that the radiative interaction of the 
hybrid qubit with squeezed vacuum is that of an idealized atom inter- 
acting directly with squeezed vacuum. We generated squeezed vacuum 
by pumping a lumped-element Josephson parametric amplifier (LJPA) 
with two tones at frequencies @, and «2, which were evenly spaced 
about the qubit transition frequency” and satisfied wp = (@, + @2)/ 
2 = Wq The bandwidth of the squeezing was 13 MHz, which is suf- 
ficient to fulfil the large bandwidth assumption based on the radiative 
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Figure 1 | Simplified experiment set-up. a, The phase space of a mode of the 
electromagnetic field is described in terms of its quadrature amplitudes, X, and 
X,. The Gaussian variance of the vacuum state is shown as a dashed line, anda 
squeezed state as the blue-grey region. b, A lumped-element Josephson 
parametric amplifier is used to generate squeezed vacuum that is coupled to the 
input port of a hybrid qubit (3D transmon) through a circulator with coaxial 
cables. A second port (not shown) is used for readout. This port is weakly 
coupled and has negligible influence on the relaxation. c, The state ofa two-level 
atom may be represented on the Bloch sphere with angles 6 and ¢ describing 
the latitude and longitude, respectively. d, The resonant strong light—matter 
dipole interaction of the transmon circuit with the 3D cavity results in two 
states, |+) and | —). The bandwidth of the squeezing is centred about Og the 
transition frequency between the ground state and | —), and is large compared 
with the natural linewidth of the transition. 


linewidth of the qubit, y/2m = 240 kHz. The output of the amplifier 
was connected with coaxial cables to the strongly coupled port of the 
superconducting cavity. 

To demonstrate the effect of squeezed vacuum on the transverse 
decay of the qubit, we conducted Ramsey measurements at different 
angles along the equator of the Bloch sphere. The Ramsey measurements 
consisted of an initial 1/2 rotation about the —xcos(#) + ysin(@) axis, 
followed by a second 1/2 rotation about the xsin(@moat) — VCOS(@modt) 
axis, applied at variable time t. Modulation of the rotation angle of the 
second 1/2 pulse at frequency @moa resulted in oscillatory Ramsey 
fringes without detuning. Figure 2a shows (a,) as a function of time 
and angle with the squeezing turned off; (¢,) exhibits exponentially 
damped, sinusoidal oscillations with angular frequency ®moa and phase 
#, with a uniform decay time Tj = 1.08(4) js. The fact that T> is less 
than 2T| indicates the presence of a small amount of pure dephasing 
characterized by a time scale Ty = 6.6 + 0.5 ts. Figure 2b shows the 
results of the Ramsey measurement when the LJPA pump was turned 
on to generate squeezed vacuum for the variable duration between the 
first and second m/2 pulses. The power gain of the amplifier was 4 dB. 
The transverse decay in the presence of squeezed vacuum reveals two 
timescales, T;. = 1.67 1s and Ty = 0.28 pts, which describe the exponen- 
tial decay of coherence when the qubit is prepared along the +x and +y 
axes, respectively. Subtracting the pure dephasing from the measured 
timescales gives the radiative transverse decay times, T, = 2.2 ps and 
Ty =0.29 pts. The interaction with squeezed vacuum both enhances 
decay along the y axis, owing to the increased fluctuations in the X, 
quadrature of the field, and suppresses decay along the x axis, owing to 
the reduced fluctuations in X. 

The radiative decay dynamics in the presence of squeezed vacuum 
can be presented as a trajectory of the Bloch vector. To illustrate this, 
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Figure 2 | Transverse decay into squeezed vacuum. a, The Ramsey 
measurement as a function of angle consisted of a first 1/2 rotation 
about the —x cos (#) +7 sin (#) axis, to prepare the in the state | 7/2, ?) 
(in the notation of Fig. 1), followed by a second 1/2 rotation about the 
X sin (MOmodt) — ¥ COS (Wmoat) axis applied at variable time t. The two- 
dimensional plot shows (,) as a function of t and @; (a) is characterized by 
sinusoidal decay with a uniform decay constant Tj and phase ¢. b, The 
transverse decay into squeezed vacuum was measured by turning the pump for 
the LJPA on between the qubit pulses. The two-dimensional plot indicates that 
after rapid decay of coherence along the +7 axes, the resulting coherence along 
the +X axes decays with time constant T, > T}.¢c, The Ramsey measurement 
for the qubit prepared along the — j(# =) axis with the squeezing off. d, e, The 
Ramsey measurement in the presence of squeezed vacuum for the qubit 
prepared along the + x(#=7/2) (d) and —j(¢=7:) (e) axes. 


we prepared the qubit in an initial state |i) with a 0.677 rotation about 
the —xcos(#) + ysin(@) axis with ¢ = 0.832. After this preparation, 
the pump of the LJPA was turned on for a variable duration to generate 
squeezed vacuum. After this variable period of time, we tomographi- 
cally reconstructed the qubit state either using 7/2 rotations around the 
x and y axes followed by state readout in the o, basis, to determine the 
Bloch vector components (o,) and, respectively, (¢,); or using no 
rotation and state readout in the c, basis, to determine (a,). The tra- 
jectory of the Bloch vector, displayed in Fig. 3, follows the expected 
decay dynamics based on equation (1) with fast decay along the y and 
Zaxes and slow relaxation along x. The final state of the qubit is 
described by (ox) — 0, (a,) > 0.07 and (a) > 0.36. The steady-state 
value of (c,) is consistent with a bath of N= 0.88 photons, which 
characterizes the average photon occupation of the squeezed state. 
The remnant coherence along the y axis is the result of a small coherent 
component of the squeezed state. In combination with the radiative 
decay of the qubit, this coherent drive, which is characterized by a Rabi 
frequency Qp <1/T), results in a steady-state coherence”, (o,).5 x 
QpT,. From our measurements, we find that Qa~2z x 10 kHz, con- 
sistent with the 65-dB on/off ratio of our qubit manipulation pulses. 
Because the qubit’s decay dynamics are sensitive to altered vacuum 
fluctuations, they can be used to probe squeezed states of light. Previ- 
ously, noise and correlation measurements have been used to char- 
acterize the squeezed states generated by microwave parametric 
amplifiers’’’°’”, Similarly, qubits have been used to reconstruct loca- 
lized non-classical states of light tomographically*”. Here we use 
the qubit’s decay dynamics to reconstruct, to second order, the 
Wigner distribution for the itinerant squeezed state generated by the 
LJPA. From T,, the measured decay constant of (c,), and T,, we 
determine that N = 0.88 and M = 1.08, from which the Gaussian vari- 
ances of =2(N+M-+1/2) and 64 =2(N—M-+1/2) are calculated. 
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Figure 3 | Radiative decay dynamics in squeezed vacuum. a-d, Quantum 

state tomography with measurements equally spaced between 0 and 3 ts shows 
the evolution of the Bloch vector between initial state |i) = |0.677, 0.83m) and 
final state |f). The dynamics are characterized by fast decay along 7 and 2 and 


Figure 3e shows the reconstructed Wigner distribution for the squee- 
zed mode at frequency po. 

In Fig. 4a, we display the effective decay constants for different 
values of the detuning 6 = wy — mg between the centre frequency of 
the pump and the qubit. The decay constants T,, and T, depend 
strongly on the detuning near resonance, highlighting the tell-tale 
evidence of interaction with squeezed vacuum where T,,>2T). 
When the detuning is large, the squeezing axis rotates rapidly with 
respect to the qubit axis and the decay times approach a constant value 
of 2T,/(2N + 1). The solid black and red lines in Fig. 4a indicate the 
expected dependence of T;, and T, on 6 as discussed in Supplementary 
Information. 

In Fig. 4b, we display the decay constants measured for different bias 
conditions of the LJPA, obtained by changing the power of the pump 
tones. The transverse decay rates were measured as shown in Fig. 2, 
and we use the measured value of T, to determine N. As expected, 
larger LJPA gain results in larger amounts of squeezing with an assoc- 
iated increase in T, and decrease in T, and T,. Figure 4c displays 
M — Nversus N. The reduction of M from its maximum allowed value, 
shown as a dashed line, may be attributed to two possible sources: 
losses in the microwave components between the LJPA and the qubit, 


slow decay along x. e, From the radiative decay rates, we tomographically 
reconstruct the Wigner quasiprobability distribution of the itinerant squeezed 
vacuum mode at Wp. Inset, Gaussian half-widths of the squeezed (solid) and 
vacuum (dashed) states. 


and non-ideal performance of the LJPA characterized by added ther- 
mal noise. If we assume that the LJPA produces an ideal squeezed state, 
with M =.,/N(N +1), then the degradation can be accounted for by 
an attenuation of the squeezed vacuum from the LJPA by a factor of 
n = 0.5. Attenuation degrades the squeezed vacuum by absorbing cor- 
related photons, thereby making the quadrature fluctuations tend 
towards the ordinary vacuum fluctuations. This level of attenuation 
is consistent with the anticipated insertion loss between the LJPA and 
the qubit due to the microwave components we used. For N> 1, 
however, it seems that the performance of the LJPA may become 
non-ideal as indicated by the slight reduction in M — N for N> 1. 

Our results demonstrate the ability to alter the vacuum environment 
of a two-level atom to a degree that has so far not been achieved in 
atomic and molecular systems, allowing the direct study of a long- 
sought aspect of the light-matter interaction. Our system also demon- 
strates the strength of using superconducting artificial atoms as sens- 
itive detectors of the quantum states of the electromagnetic field. 
Future studies with squeezed light and superconducting qubits may 
enhance the fidelity of quantum gates, enable the generation of multi- 
qubit entanglement” and allow the study of non-Markovian quantum 
baths. 
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Figure 4 | Dependence of the transverse and longitudinal decay times on 
LJPA detuning and bias. a, Effective decay constants versus the detuning of 
the centre frequency of the LJPA from the qubit, 6 = a) — @,. The decay 
constants T,, and T, show a dependence on the detuning of the squeezing from 
the qubit transition frequency: on resonance T,, reaches its maximum value and 
T, reaches its minimum value. The ordinary vacuum decay limit, 2T, is shown 
for comparison. The solid lines show the theoretical dependence of T,, (black) 
and T, (red) on the detuning. The upper sketch indicates how detuning causes 
the squeezing ellipse to rotate relative to the qubit coordinates. b, Measured 
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values of T;, T, and T, for increasing LJPA gain versus N. Error bars indicate the 
s.e.m. based on 10 successive measurements. The upper panel indicates how the 
aspect ratio of the squeezed state changes for increasing N. c, M — N versus N. 
The dashed line indicates a minimum-uncertainty squeezed state, which is 
expected for ideal squeezing. The solid line indicates the expected dependence 
for a quantum efficiency of 7 = 0.5. The grey region indicates values of M and N 
that are forbidden and the red region indicates values of M and N that 
correspond to classical states of light. 
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METHODS SUMMARY 


Owing to the finite temperature of the 50-Q environment and other sources of 
noise, a small average number of photons, Ni», are expected to contaminate the 
vacuum environment of the qubit. This bath of thermal photons both reduces the 
measured energy decay time T, from its intrinsic value by a factor of 1/(2Ny, + 1) 
and increases the equilibrium excited state population. We determined the equi- 
librium excited-state population to be 1.8% using a Rabi population measurement”, 
allowing us to place an upper limit on the number of thermal photons that char- 
acterize our vacuum environment, of Ny, = 0.019, and, thus, on the intrinsic radi- 
ative decay time, T, = 0.67 ts. We observe a transverse decay rate of T,.>2T), 
indicating that, even in the presence of thermal photons, we have demonstrated 
interaction with fluctuations below the ordinary vacuum level. Although their 
effects are small, these thermal photons were included in our determination of N 
and M. 

The qubit was composed of a transmon circuit with charging energy Ec/h = 
208 MHz and Josephson energy Ej/h = 23.27 GHz, coupled to a 3D aluminium 
cavity with resonance frequency w,/2m = 6.0456 GHz at rate g/2m = 126 MHz. 
The cavity was equipped with two ports: a strongly coupled port that limited the 
quality factor to Q = 1.1 X 10* and a weakly coupled port. Outgoing signals from 
the qubit passed through several circulators and filters before amplification with 
a HEMT (high-electron-mobility transistor) amplifier at 2.7K. The qubit was 
enclosed in successive layers of superconducting and magnetic shields and anchored 
to the mixing chamber stage of a dilution refrigerator with a base temperature of 
20 mK. State readout was performed using the Jaynes-Cummings nonlinearity 
technique” by driving the weakly coupled port of the cavity at 6.0467 GHz and 
integrating the first 200 ns of transmitted signal. Further details of the experimental 
set-up are given in Supplementary Information. 

The LJPA was composed of a two-junction SQUID (superconducting quantum 
interference device) formed of 1-j1A Josephson junctions shunted with 1 pF of 
capacitance and isolated from the input ports of a 180° hybrid coupler with 
interdigitated capacitors that resulted in a quality factor of Qiypa = 100. The 
LJPA was flux-biased to have a low power resonance at 5.897 GHz. The differential 
port of the hybrid coupler was connected to the strongly coupled port of the qubit 
with coaxial lines through two circulators and a -20-dB coupler that allowed the 
injection of the qubit manipulation pulses. Using numerical modelling of the 
LJPA, we found that the values of N obtained from the T, timescale were consistent 
with the measured power gain of the amplifier at each bias condition. Additional 
characterization of the LJPA is described in Supplementary Information. 

A single microwave source was used to generate the qubit preparation, tomo- 
graphy and LJPA pump pulses. The LJPA pump was obtained by driving an I-Q 
mixer with a tone at 540 MHz and adjusting the d.c. offsets to suppress the carrier. 
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Hydrogenases are the most active molecular catalysts for hydrogen 
production and uptake’, and could therefore facilitate the develop- 
ment of new types of fuel cell’°. In [FeFe]-hydrogenases, catalysis 
takes place at a unique di-iron centre (the [2Fe] subsite), which 
contains a bridging dithiolate ligand, three CO ligands and two 
CN” ligands®’. Through a complex multienzymatic biosynthetic 
process, this [2Fe] subsite is first assembled on a maturation 
enzyme, HydF, and then delivered to the apo-hydrogenase for 
activation®. Synthetic chemistry has been used to prepare remark- 
ably similar mimics of that subsite’, but it has failed to reproduce 
the natural enzymatic activities thus far. Here we show that three 
synthetic mimics (containing different bridging dithiolate ligands) 
can be loaded onto bacterial Thermotoga maritima HydF and then 
transferred to apo-HydA1, one of the hydrogenases of Chlamy- 
domonas reinhardtii algae. Full activation of HydA1 was achieved 
only when using the HydF hybrid protein containing the mimic 
with an azadithiolate bridge, confirming the presence of this ligand 
in the active site of native [FeFe]-hydrogenases”’®. This is an 
example of controlled metalloenzyme activation using the combi- 
nation of a specific protein scaffold and active-site synthetic ana- 
logues. This simple methodology provides both new mechanistic 
and structural insight into hydrogenase maturation and a unique 
tool for producing recombinant wild-type and variant [FeFe]- 
hydrogenases, with no requirement for the complete maturation 
machinery. 

Complexes 1''”’, 2"* and 3’° (Fig. 1a) represent the closest synthetic 
mimics of the [2Fe] subsite in HydA1. They all share the same primary 
coordination sphere with four CO, two CN and a bridging dithiolate 
ligand. They do however differ in the nature of the central bridgehead 
atom of the dithiolate: carbon in 1, nitrogen in 2 and oxygen in 3. The 
nature of this atom in the enzyme [2Fe] subsite has been a matter of 
controversy””"°'®, Anaerobic reaction of HydF from T. maritima 


Figure 1 | Structures of the di-iron clusters discussed in the study. 
a, The synthetic mimics 1"'"’, 2" and 3”°. b, Proposed structure for the 
x-HydF (x = 1-3) hybrid proteins. c, The H-cluster (active site) of 


(expressed in Escherichia coli), containing a [4Fe-4S] cluster’? and 
named “HydF in the following, with a tenfold molar excess of complex 
1, 2 or 3, led to new hybrid species, x-HydF (x = 1, 2 or 3 respectively), 
that could be isolated in pure form and characterized. Indeed, in all 
cases, iron quantification showed an increase from 3.9+0.4 to 
5.6 + 0.4 iron atoms per protein, and the ultraviolet-visible spectrum 
of these hybrids displayed features consistent with a ~1:1 ratio of the 
synthetic complexes and the HydF protein (Supplementary Fig. la—c). 

Fourier transform infrared (FTIR) spectroscopy is a convenient method 
for characterizing metalloproteins such as hydrogenases containing 
CO and CN’ ligands’’. Thus, further evidence for the incorporation 
of synthetic complexes in HydF was obtained from their FTIR spectra, 
which contained CN -stretching bands between 2,000 and 2,100 cm?! 
and four partly overlapping CO-stretching bands in the 1,800-2,000 cm 
range (Fig. 2b and Supplementary Table 1). The high-energy bands under- 
went a 40cm shift on '°C-labelling of the CN” ligands (Supplemen- 
tary Fig. 2). Interestingly, the width of the FTIR bands is still identical to 
those of the unbound complexes (Fig. 2a) but their positions show strong 
similarities to those of Clostridium acetobutylicum HydF (Fig. 2b and 
Supplementary Table 1), a HydF preparation isolated from a strain 
of C. acetobutylicum expressing the complete maturase machinery 
(including HydE and HydG)””. Clostridium acetobutylicum HydF con- 
tains, in addition to a [4Fe-4S] cluster, a still-undefined [2Fe] centre 
and is capable of activating the apo form of HydA1”. Although the 
width of the FTIR bands of the hybrids would suggest a ligand con- 
formational freedom similar to that of the unbound complexes, the 
position of the FTIR bands is a clear indication that the synthetic 
complexes closely mimic the natural [2Fe] subsite in HydF. 

The arrangements in which the synthetic complexes are bound 
to HydF and its [4Fe-4S] cluster are not evident from the FTIR spectra. 
In particular, FTIR spectroscopy does not allow terminal and bridg- 
ing cyanide ligands to be definitively distinguished (see below and 


"on 


co 


HydA 
(native [FeFe]-hydrogenase) 


[FeFe]-hydrogenase. The protein ribbon and the [4Fe-4S] clusters (shown as 
balls and sticks with Fe shown as white spheres) are shown only schematically. 
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Figure 2 | Normalized FTIR spectra recorded in liquid solution at 15 °C. 
a, Complexes 1-3. b, Clostridium acetobutylicum (Ca)HydF (from ref. 19) and 
x-HydF (x = 1-3) hybrid species. c, HydA1 after treatment of apo-HydA1 with 
1-HydF (1-HydA1), 2-HydF (2-HydA1) and 3-HydF (3-HydA1). Peak 
positions in the spectrum of 2-HydA1 are colour coded to indicate the 
contributions from Hox (red), Hrea (violet), Hsrea (green) and Hox-CO (blue) 
(see Supplementary Fig. 6 for a complete data set). d, HydA1 from 

C. reinhardtii expressed in C. acetobutylicum (CrHydA1) (ref. 18). Colour code 
as in c. Samples of complexes 1-3 and x-HydF (x = 1-3) were prepared in 
HEPES buffer (20 mM, 100 mM KCl) pH 7.5. Samples of x-HydA1 were 
prepared in 10 mM Tris-HCl buffer pH 8 containing sodium dithionite 
(2mM). 


Supplementary Discussion)”. Electron paramagnetic resonance (EPR) 
and two-dimensional hyperfine sub-level correlation (HYSCORE) 
spectroscopies are more powerful in this respect. They demonstrate 
a close interaction between the cluster and the synthetic complex, as 
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revealed for the case of 1-HydF. First, the EPR spectrum of the spin- 
half (S = 1/2) [4Fe-4S]'* cluster in dithionite-reduced 1-HydF was 
markedly different from that of the reduced cluster in unloaded HydF, 
with the high-field feature at g = 1.90 (g is the Lande factor) in HydF 
shifted to g= 1.93 in 1-HydF (Fig. 3a). A comparable shift was 
observed in the case of hybrids 2~HydF and 3-HydF (Supplemen- 
tary Fig. 3). The absence of additional signals indicated that, in all 
cases, the synthetic complex remained in the EPR-silent Fe(1)Fe(1) state 
(both iron centres are in a low-spin S = 1/2 configuration but are anti- 
ferromagnetically coupled, leading to a diamagnetic S = 0 ground state). 

Second, pulsed EPR spectroscopy unambiguously demonstrated 
that the [4Fe-4S] cluster and the [2Fe] subsite analogue shared a 
CN” ligand in 1-HydF. For this purpose we used a nuclear coher- 
ence-transfer experiment, CF-NF”', correlating the combination fre- 
quencies (CF) with the nuclear frequencies (NF), which is more 
sensitive than HYSCORE spectroscopy for disordered systems and 
best adapted for the observation of °C signals in the presence of weakly 
coupled ‘*N atoms. This is the first time to our knowledge that a 
metalloprotein has been characterized in such an experiment. The 
CF-NF spectrum of 1-HydF (Fig. 3b, right) displayed peaks from 
distant '°C carbon atoms present in natural abundance. When 1 was 
prepared with '*C-labelled CN’, the spectrum of 1-HydF displayed a 
new feature reflecting coupling of the unpaired electron in the [4Fe-4S] 
cluster with the ‘°C nucleus, characterized by a hyperfine coupling 
constant of 4.0 + 0.2 MHz (Fig. 3b, left). As shown in Supplemen- 
tary Fig. 4, the HYSCORE spectrum of reduced 1-HydF displayed 
an additional feature consistent with the presence of a nitrogen atom 
weakly coupled to the [4Fe-4S] cluster in 1-HydF. The hyperfine 
coupling constant (ay = 1 MHz) is significantly smaller than those 
generally obtained when a N atom is directly coordinated to an Fe-S 
cluster’”**”* (ay in the range of 4-7 MHz). 

These coupling constants are consistent with a CN’ ligand bridging 
one iron of the [4Fe-4S] cluster and one iron of 1, as established by 
density functional theory (DFT) calculations (a detailed description 
of the DFT calculations is provided as Supplementary Discussion and 
in Supplementary Tables 2-5). More precisely, computed hyperfine 
coupling constants indicate that the cyanide C atom is bound to one 
iron atom belonging to a mixed-valence (Fe**) iron of the [4Fe-4S] 
cluster and that the N atom is bound to the di-iron complex, implying 
cyanide linkage isomerism on formation of 1-HydF, as found in the 
synthesis of Prussian blue analogues™ and other molecular metal 
clusters”°”’. Furthermore, DFT calculated values of CO and CN stretch- 
ing frequencies (2,010 and 2,060 cm ') ofa 1-HydF model, containing 
a CN ligand bridging the [4Fe-4S] cluster and complex 1, are well in 
the range of the experimental values (2,038 and 2,055 cm 3; see Sup- 
plementary Discussion and Supplementary Table 6). 

The hybrid x-HydF proteins were studied for their potential to 
activate apo-HydA1 from C. reinhardtii containing a single [4Fe-4S] 
cluster and no [2Fe] subsite**. A pure preparation of apo-HydA1 was 
incubated anaerobically with 10 equiv. of the hybrid protein, the 
optimal excess ratio (Supplementary Fig. 5), in phosphate buffer pH 
6.8 at 37 °C for 30 min, and hydrogen evolution was monitored under 
standard conditions (Methods)**. No H; evolution could be detected 
using HydF, 1-HydF or 3-HydF (Fig. 4). In contrast, vigorous H, 
evolution was observed using 2-HydF, corresponding to a specific 
activity of 700-800 umol H, per min per mg HydA1, comparable to 
the activity values reported for wild-type HydA1”, thus indicating 
complete maturation/activation of HydAl by 2—-HydF** (Fig. 4 and 
Supplementary Fig. 5). Furthermore, activation by 2-HydF was more 
efficient than by C. acetobutylicum HydF (specific activity, 350-400), 
assayed under the same conditions (Fig. 4). Indeed, C. acetobutylicum 
HydF provided full activation only when present in larger excess”. 
2-HydF by itself did not show any hydrogenase activity. Finally, apo- 
HydA1 was treated with a fourfold excess of x-HydF (x = 1, 2 or 3), 
under reducing conditions, then separated from HydF by affinity 
chromatography and analysed by FTIR spectroscopy. In all cases, the 
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Figure 3 | Continuous wave and pulsed EPR spectra of 1-HydF. a, X-band 
EPR spectra recorded at 10 K for dithionite-reduced 1-HydF (red line) and 
HydF (black line) in 50 mM Tris-HCl buffer, 150 mM NaCl, 5 mM sodium 
dithionite, pH 8. Microwave power, 100 LW; modulation amplitude, 1 mT; 
microwave frequency, 9.39 GHz. The shoulder observed at g = 1.90 in the 
1-HydF spectrum, corresponding to a few per cent of the total signal intensity, 
is assigned to a small fraction of HydF lacking 1. b, X-band two-dimensional 
pulsed electron spin echo envelope modulation (ESEEM) spectroscopy 


presence of characteristic narrow Fe-CO and Fe-CN bands demon- 
strated that the synthetic complex had been transferred from HydF to 
HydA1 (Fig. 2c and Supplementary Fig. 6). The FTIR spectrum of 
HydA1 after treatment with 2-HydF shows a strong correspondence 
to that of fully active wild-type HydA1 (Fig. 2d)'*. Specifically, both 
species exist as a mixture of oxidized (H,x), reduced (H;eq) and super- 
reduced (Hy,eq) redox states of the H-cluster (the H-cluster is the complete 
active site of HydA shown in Fig. 1c) that all participate in the catalytic 
cycle’*. Furthermore, after flushing with CO, a complete conversion to 
H,x-CO occurred (Supplementary Fig. 7). 

These data demonstrate that 2 is efficiently transferred from HydF 
to apo-HydA1, where it acquires the structure of the natural active 
[2Fe] subsite. This implies isomerization of one CN’ ligand, replace- 
ment of one CO by a cysteinate ligand of the proximal [4Fe-4S] cluster 
in HydA1 and conformational rearrangement to adopt the inverted 
square pyramid structure required for opening a substrate binding site 
on the distal iron atom of the [2Fe] subsite (Fig. 1)’. We note that 
1-HydA1 and 3-HydA1 both show ‘H-cluster-like’ FTIR signatures. 
In fact, the FTIR spectrum of 1-HydA1 has strong similarities with the 
H,x-state (Fig. 2c and Supplementary Fig. 6) whereas the FTIR spectrum 
of 3-HydA1 does not resemble that of any known H-cluster redox 
state, but seems to indicate a pure redox state and even shows a band 
assigned to a bridging CO. 

Besides unequivocally demonstrating that nitrogen is the bridge- 
head atom in the dithiolate ligand of the H-cluster, these results shed 
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Figure 4 | Specific hydrogenase activity of reconstituted HydA1. Activity of 
HydA1 (jtmol H2 per min per mg HydA1) was measured in the presence of 
methyl viologen (10 mM) and sodium dithionite (100 mM) after in vitro 
maturation of apo-HydA1 for 30 min at 37 °C with 10 equiv. of 

x-HydF (x = 1-3), HydF or C. acetobutylicum (Ca)HydF. The value for the last 
was obtained after a 60-min reaction and was taken from ref. 23. Data show 
mean + s.d. 
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(CF-NF) of 1-HydF labelled with '3CN7 (left) and unlabelled 1-HydF (right). 
The horizontal ridge seen at a frequency v2 of 7.7 MHz along the frequency 2 
axis (V7 being equal to 2v,3c with v;3c the Zeeman frequency of a 3C nuclear 
spin) is attributed to a hyperfine interaction (a)3<) between a 13C nucleus and 
the paramagnetic [4Fe-4S] cluster. Its extension (Av,) along the frequency 1 
axis yields the magnitude of the coupling (Av, = 4.0 MHz = a)3¢). This feature 
is absent from the unlabelled 1-HydF spectrum. 


light on a number of important questions regarding hydrogenase mat- 
uration. They strongly support the hypothesis that HydF transiently 
binds a di-iron precursor of the active [2Fe] subsite of HydAl and 
suggest stabilization through interactions with the [4Fe-4S] cluster. 
The structure of this natural precursor is likely to be very close to that 
of 2. Further investigation of HydA1 maturation by the hybrid system, 
combining site-directed mutagenesis experiments and synthetic mani- 
pulation of the [2Fe] subsite (for example, isotopic labelling as shown 
here with '°CN), will probably provide additional insight into the 
transfer mechanism and the structure of both HydF and HydA1 binding 
sites. These data also demonstrate the unique properties of the HydA1 
protein binding pocket in converting the otherwise inactive complex 2 
into an active catalyst. More importantly, this novel artificial hybrid 
maturase system provides a unique, simple and straightforward bio- 
technological tool for producing active recombinant hydrogenases, 
with no requirement for coexpression with the still incompletely char- 
acterized complex biosynthetic machinery. 

Because this procedure has been shown to work with proteins (HydF 
from T. maritima and HydA1 from C. reinhardtii) from two comple- 
tely different organisms, it is very likely that [FeFe]-hydrogenases from 
other microorganisms, overexpressed in their apo form in E. coli (which 
lacks the maturation machinery), could also be activated through sim- 
ple reaction with 2-HydF. This reaction could thus be used for explor- 
ing a large variety of [FeFe]-hydrogenases—for example, from different 
species or derived from directed mutagenesis—with the aim of finding 
the most active and stable enzymes for exploitation in biotechnological 
processes of H, production* as well as in bioelectrodes in (photo)elec- 
trolysers or fuel cells*°. 


METHODS SUMMARY 


Recombinant T. maritima HydF protein was isolated, and its [4Fe-4S] cluster introduced 
using enzymatic reconstitution, as previously described’’. The synthetic complexes 
1-8, 2" and 3" were prepared as previously described with slight modifications 
of the purification procedures described in the Supplementary Information. 
Hybrid proteins (1-HydF, 2-HydF and 3-HydF) were prepared under strictly 
anaerobic conditions in a glove-box. In a standard experiment, 150 1M HydF in 
50 mM Tris-HCl, 150mM NaCl, buffer pH 8 was incubated with a tenfold molar 
excess of the complex (1, 2 or 3) for 30 min. The protein was then desalted on a 
NAP-25 cartridge (GE Healthcare) and concentrated with Amicon Ultra centri- 
fugal filters 10K (Millipore). The protein was stored in liquid nitrogen. 

In vitro maturation of apo-HydA1 from C. reinhardtii overexpressed in E. coli 
by the HydF hybrids was assayed as previously described’’. Apo-HydA1 was 
incubated in 0.1 M potassium phosphate buffer pH 6.8, 2mM sodium dithionite 
with an excess of the respective x-HydF hybrid protein for 30 min at 37 °C ina total 
volume of 400 il. The specific hydrogenase activity was determined as described” 
by transferring the maturation solution to a 1.6 ml reaction mixture containing 
100 mM sodium dithionite and 10 mM methyl viologen in the same buffer. For 
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FTIR measurements, apo-HydA1 and 4 equiv. x-HydF were incubated in 10 mM 
Tris-HCl pH 8.0, 2 mM sodium dithionite for 1 h at 37 °C and HydA1 was purified 
through a strep tag affinity column. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


All chemicals were purchased from Sigma-Aldrich and used as received 
unless otherwise stated. NMR was recorded on a Bruker AC300 using the resi- 
dual solvent peak as internal standard. Complex (Et,N)2[Fe2(adt)(CO),(CN)=]) 
(2, adt* = 2-azapropanedithiolate)"! was synthesized following literature pro- 
cedure, and (Et,N)>[Fe(pdt)(CO),(CN)2] (1, pdt” = propanedithiolate)", 
(Et,N),[Fes(pdt)(CO)4(7°CN)>]*" and (Et,N)2[Fe2(odt)(CO)4(CN)2] (3, odt™ = 2- 
oxopropanedithiolate)'* were prepared by modified literature procedures (see 
Supplementary Information) and their thin-film solution FTIR spectra were 
recorded on a Perkin Elmer Spectrum-100 spectrometer. TmHydF (referred to 
as HydF throughout the text) was overexpressed, purified and its [4Fe-4S] cluster 
reconstituted as previously described’. Apo-CrHydA1 (referred to as apo-HydA1 
throughout the text) was overexpressed in E. coli BL21 DE3 AiscR using growth 
conditions and a pET plasmid as previously published for the production of active 
HydA1 in E. coli”. 

Protein purity was assessed by gel electrophoresis by loading samples on Any kD 
Mini-Protean TGX precast gels (Biorad) alongside Precision Plus Protein stan- 
dards (Biorad). Migration was achieved on a Mini-Protean apparatus (Biorad) at 
200 V for 30 min. Protein concentrations were determined with the Biorad Protein 
Assay, using bovine serum albumin as a standard as well as by optical absorption 
measurements. Aerobic ultraviolet-visible absorption spectra were recorded on a 
Cary 1Bio spectrophotometer (Varian) and anaerobic measurements were made 
with a fibre-optic-fitted UvikonXL spectrophotometer (BioTek Instruments). Iron 
and sulphur quantification were performed following the methods of refs 33 and 
34, respectively. The specific hydrogenase activity was determined as described 
previously”. 

Spectroscopic characterization. FTIR spectra of protein samples were recorded 
ona Bruker IFS 66 v/s FTIR spectrometer equipped with a Bruker MCT (mercury 
cadmium telluride) detector. The spectrometer was controlled by Bruker Opus 
software. All measurements were performed at 15 °C with a resolution of 2cm™|. 
The spectra were accumulated in the double-sided, forward-backward mode with 
1,000 scans. Data were processed using custom-written routines in the MATLAB 
programming environment. FTIR samples of complexes 1-3 and x-HydF 
(x = 1-3) were prepared in HEPES buffer (20mM, 100 mM KCl) pH 7.5. FTIR 
samples of x-HydA1 were prepared in 10 mM Tris-HCl buffer pH 8.0 containing 
sodium dithionite (10mM). For the FTIR measurement of maturated HydA1, 
apo-HydA1 was washed twice with 10 mM Tris-HCl pH 8.0, 2 mM sodium dithio- 
nite (a buffer referred to below as TPW2) by concentration and dilution to remove 
any trace of desthiobiotin originating from the prior purification of apo-HydA1 by 
strep-tag affinity chromatography. 100 pl of TPW2 buffer containing 100 LM apo- 
HydA1 and a fourfold molar excess of the hybrid protein (1-HydF, 2-HydF or 
3-HydF, respectively) was incubated for 60 min at 37°C. Afterwards, 500 pl of 
TPW2 buffer was added, and the solution was loaded on a 750 ul Strep-Tactin 
Superflow (IBA) column. The HydA1 protein was separated from 1-HydF (1- 
HydA1), 2-HydF (2-HydA1) or 3-HydF (3-HydA1) by affinity chromatography 


using 10 mM Tris-HCl pH 8.0, 2 mM sodium dithionite, 200 mM NaClas washing 
buffer and TPW2 buffer, 2.5mM desthiobiotin for elution. The isolated protein 
was concentrated using Amicon Ultra centrifugal filters 10K (Millipore) and stored 
as described previously”*. For the FTIR spectra shown in Supplementary Fig. 7, the 
preparation was done as described above with 2—-HydF but without the final 
purification step. 

X-band EPR spectra were recorded ona Bruker ESP 300D spectrometer equipped 

with an Oxford Instruments ESR 900 flow cryostat. Protein samples were anaero- 
bically reduced with 10 molar equivalents of sodium dithionite before freezing. 
Hyperfine sublevel correlation (HYSCORE) spectra and their Combination Fre- 
quency (CF) - Nuclear Frequency (NF) variants were recorded ona Bruker Elexsys 
E-580 X band (frequency, 9.71 GHz) pulsed spectrometer with a Bruker ER4118X 
dielectric resonator and continuous-flow He cryostat (Oxford Instruments CF935) 
controlled by an Oxford Instruments temperature controller ITC 503. Experiments 
(128 X 128 data set) were performed at 8 K using the standard four-pulse sequence 
(n/2-t-n/2-t,-1-t,-1/2-echo) with a nominal pulse width of 16 ns for 7/2 pulses, 
at value of 132 ns and a pulse repetition rate of 1 kHz. In the HYSCORE experi- 
ment, the delays before (f,) and after (t,) the mixing m pulse were incremented in 
20-ns steps from an initial value (fin; = 200 ns) according to the following formula: 
ty = tiny + dy and ty = ty, + dy. In the CF-NF experiment, ¢, and ft, were incremen- 
ted in 20-ns steps according to the following formula: t; = fini + dy and ty = fini + 
d, + d>. The value of fin; was chosen to be as long as 1,000 ns to remove as much as 
possible the broad features arising from '‘N quadrupole coupling. Unwanted 
echoes were removed by four-step phase cycling. The background decay in both 
dimensions was subtracted using a linear fit followed by apodization with a 
Hamming window and zero-filling to 2,048 points in each dimension. The 2D 
Fourier transform magnitude spectrum was then calculated. The static magnetic 
field was set at 3,600 G (g, ). 
DFT calculations. These were performed using the ADF2012 quantum chemistry 
code (see Supplementary Discussion). Hyperfine coupling constants were com- 
puted using the parameter-free PBEO exchange-correlation potential with triple- 
zeta basis sets (+ two polarization functions) and unfrozen cores. 
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Melting during late-stage rifting in Afar is hot and 


deep 


D.J. Ferguson’, J. Maclennan?, I. D. Bastow?, D. M. Pyle’, S. M. Jones”, D. Keir®, J. D. Blundy’, T. Plank! & G. Virgu® 


Investigations of a variety of continental rifts and margins world- 
wide have revealed that a considerable volume of melt can intrude 
into the crust during continental breakup’ *, modifying its com- 
position and thermal structure. However, it is unclear whether the 
cause of voluminous melt production at volcanic rifts is primarily 
increased mantle temperature or plate thinning’** ”. Also dispu- 
ted is the extent to which plate stretching or thinning is uniform or 
varies with depth with the entire continental lithospheric mantle 
potentially being removed before plate rupture’*"'°. Here we show 
that the extensive magmatism during rifting along the southern 
Red Sea rift in Afar, a unique region of sub-aerial transition from 
continental to oceanic rifting, is driven by deep melting of hotter- 
than-normal asthenosphere. Petrogenetic modelling shows that 
melts are predominantly generated at depths greater than 80 kilo- 
metres, implying the existence of a thick upper thermo-mechanical 
boundary layer in a rift system approaching the point of plate 
rupture. Numerical modelling of rift development shows that when 
breakup occurs at the slow extension rates observed in Afar, the 
survival of a thick plate is an inevitable consequence of conduc- 
tive cooling of the lithosphere, even when the underlying astheno- 
sphere is hot. Sustained magmatic activity during rifting in Afar 
thus requires persistently high mantle temperatures, which would 
allow melting at high pressure beneath the thick plate. If extensive 
plate thinning does occur during breakup it must do so abruptly at 
a late stage, immediately before the formation of the new ocean 
basin”®. 

The geological record at rifted margins often preserves evidence for 
voluminous magmatism during continental breakup’. However, the 
principal control on the creation of this thick transitional igneous crust 
remains a matter of considerable debate. Some authors suggest that 
small-scale convection within the mantle beneath thinned plates can 
account for the syn-rift melting’®; others have argued that increased 
mantle temperatures are also required, either via a short-lived thermal 
pulse? or from a sustained increase that persists to the early stages of 
seafloor spreading’. Also debated is the mechanism by which the 
thickness of the continental lithosphere is reduced from 100-150 km 
(ref. 14) to almost zero at the point of plate rupture. Some models 
propose that the lower-mantle lithosphere is preferentially removed at 
an early stage of breakup", possibly by stretching’* or detachment"’; 
others suggest that a significant portion of the lithospheric mantle 
remains until the final stage of plate thinning’®. 

Ethiopia offers an excellent opportunity to understand how mantle 
temperature and plate thinning covary during rifting because it exposes 
several stages of rift development, including the final stages of transi- 
tion to oceanic spreading in northern Afar’’. This setting offers a dis- 
tinct advantage over studies on passive continental margins, where 
inferences from the geological record cannot be compared directly to 
a priori constraints on features such as mantle temperature (for example, 
seismic wave speeds). 


Conventional rifting models’® predict a present-day thinning factor 
(initial thickness divided by final thickness) for the Afar lithosphere of 
approximately 3, in contrast to seismic data, which show that current 
crustal thinning in most of Afar has a factor of only <2. This discre- 
pancy between observed and predicted crustal thickness is likely to be 
partly the result of ‘magma-compensated’ thinning’, whereby extensive 
melt addition to the crust**° has reduced net thinning, with a possible 
further contribution from magmatically accommodated extension’. 
However, the response of the lower part of the Ethiopian lithosphere 
to extension remains unclear and it is debated whether the lower 
plate has been preferentially thinned”, effectively compensating for 
the modest net crustal attenuation, or whether a significant thickness 
of the lithospheric mantle remains intact’’. A related controversy con- 
cerns the temperature of the Afar mantle’*"’, which exerts a funda- 
mental control on the depth and extent of melting and is therefore a key 
parameter in understanding the ongoing magmatism and thermal 
structure of the upper mantle. 

Here we address these issues by developing a petrogenetic model 
for current rift-related magmatism in Afar. We test our petrogenetic 
results using a numerical model of rift development to investigate how 
the lithosphere has evolved over 30 million years of rifting and mag- 
matism. Our results provide new constraints on plate thickness and 
mantle potential temperature T,,, thereby aiding discrimination between 
competing models for magmatism and extension during the final 
stages of continental breakup. 

We used mafic lavas erupted from on- and off-axis vents and fissures 
from the seismically and volcanically active Dabbahu segment>”®, 
which is part of the Red Sea rift in west-central Afar (Fig. 1; Sup- 
plementary Tables 1 and 2). All lavas are enriched in incompatible 
trace elements compared to typical mid-ocean ridge basalt and have 
trace-element characteristics similar to the Ethiopian flood basalts”, 
which were erupted at about 30 million years ago, during the early 
stages of rifting and hotspot tectonism” (Fig. 2). The off-axis lavas have 
consistently more enriched characteristics than their axial counter- 
parts, including higher ratios of light rare earth elements (REE) to 
heavy REE (that is, La/Sm; Fig. 2) and of middle REE to heavy REE 
(that is, Tb/Yb; Fig. 2) and they also have different major-element 
compositions (Supplementary Table 2). These geochemical trends 
occur over short length scales (around 20km), with the implication 
that the plumbing systems feeding the main rift zone and off-axis 
volcanoes are separate throughout the crust. Calculated equilibrium 
pressure P and temperature T conditions between the major-element 
composition of the lavas and mantle peridotite (Fig. 3e; see Methods) 
indicate that the axial lavas preserve compositions consistent with con- 
ditions in the mantle at 2.3-2.6 GPa and 1,472-1,489 °C. Off-axis lavas 
give consistently lower and more variable values of 1,301-1,396 °C 
and 1.1-1.9 GPa. 

We interpret these different thermal regimes as resulting from vari- 
ations in the magma plumbing systems between these two regions. The 
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Figure 1 | Map of the Dabbahu-Manda Hararo magmatic segment. The 
Dabbahu-Manda Hararo rift zone has formed along the on-land section of the 
Red Sea rift system, near the rift-rift-rift triple junction (red lines shown in 
inset; rectangle shows location of main figure). Symbols show locations of on- 
axis (circles) and off-axis (squares) lava samples. The magma chamber feeding 
recent axial dyke intrusions’ and eruptions” is located beneath the central part 
of the segment. Arrows in the inset show motions of Somalian and Arabian 
plates relative to a fixed point (oval) in Nubia. 


on-axis melts appear to be extracted in a rapid and/or chemically iso- 
lated fashion, thus preserving P-T conditions that reflect a mean of pri- 
mary melting processes, consistent with our trace-element modelling 
presented here. In contrast, the off-axis lavas are likely to have been 
modified during ascent by reactive re-equilibration at lower P-T con- 
ditions within the lithospheric mantle. 

We constrained the conditions of mantle melting using the observed 
REE concentrations of the lavas, which during mantle melting vary as a 
function of source composition, lithology and melt productivity with 
depth”’. A key feature of these REE melting models is the effect on the 
medium REE/heavy REE values in the melt phase caused by the pre- 
sence of residual garnet in the melting lithology. Garnet preferentially 
retains the heavy REE within its crystal lattice, so melts generated at 
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Figure 2 | Trace-element compositions of mafic lavas from Afar. Lavas 
erupted from off-axis vents show consistent enrichments in the most 
incompatible elements compared to those from the axial part of the rift zone. 
The grey area shows compositions of Ethiopian flood basalts identified as 
mostly closely resembling the Afar hotspot/plume”’. Both the current Afar lavas 
and the older flood basalts show similar enrichments in incompatible elements 
compared to average mid-ocean ridge basalt (MORB) compositions. Listed 
ratios of La/Sm and Tb/Yb are means with 1o variations. Compositions are 
normalized to that of primitive mantle. 
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higher pressure where garnet is stable will have higher medium REE/ 
heavy REE compared to those generated by melting at lower pressures 
in its absence. Our REE modelling (see Methods) therefore acts partly 
as a melting barometer, which we used to constrain how the composi- 
tion of the accumulated melt reflects the distribution of polybaric melt 
production throughout the region of mantle melting. 

We started with simple forward models of melting during passive 
upwelling of mantle peridotite*. In this model, steady-state adiabatic 
fractional melting occurs at a given mantle T, and up to a specified 
depth. We initially tested melting of normal-temperature mantle 
(Tp = 1,350 + 50°C; ref. 24), where melting occurs according to the 
geodynamic model for Afar of ref. 13 with the onset of melting between 
depths of 70km and 80km and melt production continues until a 
depth of 40km. The compositions of melts generated from these 
regimes (Fig. 3a) do not match the observed REE concentrations 
and we found a particularly poor fit to the relative concentration ratios. 

Next we used a series of forward models at various T, values and 
melting intervals, from which we found that the erupted melts can be 
matched best by mantle melting at elevated T, (1,450 °C; similar to 
that proposed for Afar by ref. 19), with a relatively deep onset (about 
95 km) and termination (about 80 km) of melting (Fig. 3b). This short 
melting interval led to a greater overall contribution from melts gene- 
rated in the garnet stability field (over 86 km depth), which is required 
to retain the relatively high medium REE/heavy REE observed in the 
erupted melts. These results therefore imply that melt production 
beneath Afar initiates at relatively high pressures and temperatures, 
and does not continue to shallow depths. 

We developed our model further by using the inversion scheme of 
ref. 23 to obtain physical melting conditions from the observed REE 
concentrations. As before, we examined shallow melting regimes asso- 
ciated with lower T,, values, but because such melting is initiated above 
the garnet stability field, varying the melting rate with depth cannot 
lead to a closer match with the observed melts (Supplementary Fig. 1a). 
Increasing mantle T,, to allow initial melting to occur in the garnet 
stability field led to an improved fit to the observed melts, but in cases 
where melting is allowed to continue to shallow depths, the require- 
ment to preserve the high medium REE/heavy REE leads to very low or 
negligible melt production being predicted in the upper parts of the 
melting region. Using the thermal conditions given by the major- 
element P-T results, which suggest a melting path close to the mantle 
adiabat for a T, of 1,450 °C (Fig. 3), we obtained a very close fit to the 
REE composition of the axial lavas where melting occurs between 
depths of approximately 95-80 km (Fig. 3c), in excellent agreement 
with the forward model with similar parameters (Fig. 3b). 

The clear results from both these REE models are that (1) tempe- 
ratures hotter than normal ambient mantle (that is, T, > 1,300- 
1,400 °C, ref. 24) are required to generate deep melting, and (2) melting 
initiated at depths below the garnet-spinel phase transition and insig- 
nificant melt generation occurred at depths shallower than about 
80 km. The equilibration pressures calculated for these lavas of about 
2.5 GPa (Fig. 3e) are consistent with pooling of melts from this melting 
region. An inversion model for the off-axis melt compositions with the 
same parameters yields similar results (Supplementary Fig. 1b), but 
with a slightly lower overall extent of melting (that is, a shorter melting 
column), suggesting that upwelling and melting have become focused 
beneath the rift axis and that off-axis volcanism is fed by melts from the 
margins of the melting region. 

Our geochemical modelling shows that significant asthenospheric 
upwelling and melting beneath central Afar is presently confined to 
depths greater than around 80 km. Thus, although our results do not 
provide precise constraints on lithospheric thickness, they clearly demon- 
strate that melting at shallower depths is suppressed, implying that the 
thermo-mechanical boundary layer beneath Afar remains relatively 
thick. Plate reconstructions* show that since rift initiation at about 
30 million years ago, Afar has extended by a factor of around 3, from 
a starting rift width of approximately 100 km, at rates not exceeding 
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Figure 3 | Results of REE and rifting models. a-c, Observed (circles) and 
predicted (lines) REE concentrations. a, Results from a forward model (melting 
stops at 40 km) at T, values of 1,350 °C (Fmax = 0.12; black line) and 1,400 °C 
(Finax = 0.18; grey line). b, Results of the best-fit REE forward model (melting 
stops at 80 km), for a T, value of 1,450 °C (Fax = 0.055). ¢, Results of the 
best-fit inversion model for a T,, value of 1,450 °C (Finax = 0.055). d, Cumulative 
degree of melting with depth for the best-fitting inversion model (blue line) and 
from the forward model (red line), both with a T, of 1,450 °C. Error bars are 
1s.d. e and f, Pressure-temperature conditions of mantle-melt equilibria for 
Afar lavas and predicted geothermal gradient and melting conditions after 

30 million years of rifting. Melting occurs when the temperature of the 


20mmyr | (ref. 26): equivalent to those of ultra-slow-spreading 
oceanic ridges*’. Conventional instantaneous plate stretching models’* 
predict a present-day lithospheric thickness of less than 40 km (assum- 
ing an initial thickness of about 125km), with the implication that 
melting and upwelling should be markedly shallower than is observed. 
However, finite-duration rifting models show that when extension 
occurs at low strain rates, such as those observed in Afar, the base of 
the lithosphere may be strongly affected by conductive cooling”. 

We quantify this effect for Afar using a numerical finite-duration 
rifting and melting model”*” (see Methods) to examine how the history 
of rifting may have affected melt production and plate thinning. This 
approach adds to the petrological and REE modelling by explicitly 
examining the relationships between rifting, mantle upwelling, the geo- 
therm (and hence lithospheric thickness) and melting. Unlike in the 
REE model, which assumes steady-state adiabatic melting, the genera- 
tion of melts during upwelling in the thermal model considers both the 
advection and conduction of heat. We track the development of the 
geotherm and associated melting during 30 million years of rifting at 
strain rates appropriate for Afar. 

The results (Fig. 3e, f) show that lithospheric thinning and mantle 
melting are significantly reduced in comparison with an instantaneous 
stretching model (also shown in Fig. 3e). The average depth of melting 
at 30 million years is predicted to be about 80 km, with the onset of 
melting at 95km and no melt production occurring at depths shal- 
lower than 60km. The greater lithospheric thickness and melting 
depth in the finite-duration rifting model than in the instantaneous 
case arise both because low upwelling rates permit significant conduc- 
tive cooling, and also because the geotherm and melting region have 
not yet evolved to a steady state. These results show that the presence of 
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upwelling mantle oversteps the solidus (orange line in e). The thick line in 

e shows the temperature profile affected by conductive cooling (30 million years 
of rifting) compared to the model with instantaneous extension (dashed line). 
No melting is predicted above a depth of 60 km (shaded area) because the 
instantaneous melting rate (thin line in f) falls to zero. The time-dependent 
cumulative degree of melting from the thermal model after 30 million years of 
rifting (thick line in f) shows good agreement with the REE results and is 
significantly less than that predicted by the instantaneous rifting model (dashed 
line in e). The portion of the cumulative melt degree curve (thick line) that 
decreases upward in f represents mantle that melted previously but has now 
moved upwards. Tp is 1,450 °C. DMH, Dabbahu-Manda Hararo rift zone. 


the thick thermal boundary layer in Afar implied by the petrogenetic 
results is an inevitable consequence of the protracted breakup process 
occurring here. Owing to the different and complementary assump- 
tions inherent in the REE and thermal modelling techniques (steady- 
state adiabatic melting versus evolving thermal regime) we do not expect 
a direct correlation between cumulative melting curves predicted by 
these (see Methods). However, the two classes of model give generally 
consistent results and both predict similar extents of melting beneath a 
60-80-km-thick lithosphere (Supplementary Fig. 2). 

Geophysical studies from Ethiopia and elsewhere have demon- 
strated the significant role of magma intrusion in modifying crustal 
compositions and maintaining crustal thickness during continental 
rifting'’”'°. Both our modelling approaches show that the ongoing mag- 
matism here requires the underlying mantle to have an elevated poten- 
tial temperature of around 1,450 °C, allowing partial melting to occur 
at high pressures beneath a 60-80-km-thick lithospheric lid. This 
mantle T,, is consistent with previous estimates by ref. 19 as well as 
the markedly slow seismic wave speeds observed in the Ethiopian 
mantle*’’. Although shallow magmatic processes in the Afar crust 
currently have some affinity with those observed at the mid-ocean 
ridges””’, our results show that net lithospheric thinning of this slowly 
extending continental lithosphere remains modest. If a new ocean 
basin were to open here an abrupt phase of late-stage plate thinning 
would therefore probably be required’®. 


METHODS SUMMARY 

P-T conditions of mantle-melt equilibration. A peridotitic, rather than pyro- 
xenitic, source was determined using diagnostic elemental ratios such as Zn/Fe 
and Mn/Fe (Supplementary Fig. 1). P-T conditions of mantle-melt equilibria were 
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calculated using the Si and Mg thermobarometer of ref. 30 for lavas with 
MgO > 7 wt% and assuming an HO content of 0.5 wt%. These were corrected 
for olivine removal to be in equilibrium with Mg-rich olivine compositions of 
Fog (90% forsterite) using an Fe-Mg olivine-melt partition coefficient of 0.3 and 
Fe**/ZFe of 0.16, determined from oxygen fugacity conditions constrained by 
analysis of V in olivine and the host lavas. 

REE melting calculations. The source composition for melting models was cal- 
culated by matching the Nd isotopic composition of lavas by mixing primitive and 
depleted mantle. Forward® and inversion” REE models were then used to estimate 
the cumulative degree of melting against depth relationship present in the mantle. 
For the inversion we used REE data from high MgO lavas, corrected for fractiona- 
tion. Melting was assumed to be fractional and the garnet-spinel transition depth 
was set at 86-100 km based on experimental studies. 

Two-dimensional rifting model. We modelled a square-sided rift undergoing 
pure shear stretching with a total strain of 3 over 30 million years, in line with ref. 26. 
The equilibrium thickness of the lithosphere was 125 km. Thermal structure was 
tracked using a two-dimensional explicit finite difference scheme that includes 
vertical and lateral advection and conduction of heat and adiabatic cooling. The 
local instantaneous melt production rate was calculated using expressions in ref. 29, 
which account for depressurization and change in thermal structure. Parametiza- 
tions in ref. 8 were used for solidus, liquidus and melting progress. Cumulative 
degree of melting was calculated by integrating the melting rate over time, account- 
ing for mantle upwelling. The geotherm and melting results in Fig. 3 are from the 
centre of the model. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Source lithology and thermobarometry. Diagnostic elemental ratios such as 
Zn/Fe (Supplementary Fig. 3; ref. 31) indicate an origin for the observed basalts 
predominantly by peridotite melting. Equilibrium P and T conditions for the 
lavas and mantle peridotite, shown in Fig. 3e, were calculated using a Si and Mg 
thermobarometer”’. Before applying the thermobarometer, lavas with MgO > 7 wt% 
(Supplementary Table 1) were corrected for olivine fractionation by incremental 
olivine addition using an Fe-Mg olivine-melt partition coefficient of 0.3 until 
equilibrium with 90% forsterite (Fogg) olivine. An important factor in this cor- 
rection is the Fe**/XFe ratio in the melt and we use a value of 0.16 determined 
directly for the Afar lavas using oxygen fugacity conditions constrained by ana- 
lysis of V in olivine and melts using the methods of refs 32 and 33. The limited 
volatile data®* available for basaltic melts from Afar suggests that pre-eruptive 
H,O contents are relatively low and we use an assumed value of 0.5 wt%. Using 
0.1 wt% or 1 wt% leads to temperature estimates that are about 10 °C higher or 
lower respectively. Several factors can affect the final major element composition 
of the melts and the calculated P-T relations therefore reflect some mean of 
these”. In the case of the axial lavas we interpret these to be mean melting con- 
ditions. The off-axis lavas give a lower range of P-T estimates, implying at least 
partial re-equilibration as shallower depths within the lithosphere/thermal boun- 
dary layer, demonstrating variations in melt transport between the rift axis and 
off-axis volcanoes. 

REE melting models. The starting mantle source composition was calculated by 
matching the Nd isotopic composition of the lavas (éq of about 5; Supplementary 
Table 2) by mixing primitive and depleted mantle end-member compositions 
from refs 36 and 37. (ena = [(*°Nd/™*Ndgampte)/(“?Nd/'*Ndcuur) - 1 X 10%, 
where CHUR is ‘chondritic uniform reservoir’ with a '*Nd/'“4Nd of 0.512638.) 
REE inversion modelling” was used to estimate the relationship between the cumu- 
lative degree of melting and depth present in the mantle. Partition coefficients were 
taken from the compilation of ref. 39 and the garnet-spinel transition depth was set 
from 86-100 km in the model runs, based on the experimental results of ref. 40. 
The initial source lithology was 59.8% olivine, 21.1% orthopyroxene, 7.6% clino- 
pyroxene and 11.5% garnet for garnet peridotite; and 57.8% olivine, 27% ortho- 
pyroxene, 11.9% clinopyroxene and 3.3% spinel for spinel peridotite. Melting was 
assumed to be fractional, and the melts were integrated over a triangular melting 
region with a constant upwelling rate. This melting geometry recovers lower melt 
fractions in the garnet field than inversion models with a one-dimensional column, 
so we believe that our conclusion of deep melting at high T, is not strongly depen- 
dent on the melting geometry. Only high-MgO basalts were used in the inversion 
runs, which were corrected for 30% fractionation using the methods of ref. 24. 
Cumulative melting curves from the inversion modelling were compared with 
theoretical curves for adiabatic upwelling at different mantle potential tempera- 
tures (the forward model). These curves were calculated using the parameteriza- 
tion of ref. 8, but with an entropy of fusion of 350J kg! °C! (ref. 41). 
Numerical thermal rifting model. We modelled a square-sided rift undergoing 
pure shear stretching (that is, linear variation in upwelling rate with depth) with a 
total strain factor (/) of 3 over a period of 30 million years. Strain rate in the most 
recent ten million years of rifting was set to 1.5 times the strain rate in the first 
20 million years because there is evidence that strain rate has accelerated during 
the rifting period’*. The initial rift width of 65 km was chosen to give a final width 
of 400 km. The equilibrium thickness of the lithosphere was set to 125 km. Evo- 
lution of the thermal structure of the lithosphere was tracked using a two- 
dimensional explicit finite difference scheme that includes vertical and lateral 
advection and conduction of heat** with the addition of adiabatic cooling. The 
geotherm and melting results plotted in Fig. 3e are from the centre of the model. 
Local instantaneous melt production rate was calculated using equation (7) in 
ref, 29, which accounts for both depressurization through upwelling and also 
temperature change by advection and conduction of heat. The parametizations 


of ref. 8 were used for solidus, liquidus and melting progress. Cumulative degree 
of melting was calculated by integrating the instantaneous melt production rate 
over time, accounting for ongoing mantle upwelling through the melting region. 
Comparison of REE and thermal rift models. The two approaches offer com- 
plementary insights into melting beneath Afar, but several important differences 
should be borne in mind when comparing the results. First, the thermal rifting 
model evolves over time, whereas the REE melting model assumes steady state. 
Second, conductive cooling in the thermal model allows a sub-adiabatic melting 
path and suppresses melting as the material moves upward, whereas the forward 
REE melting model assumes an adiabatic melting path up to the specified top of 
the melting region. These differences mean that cumulative degree of melting can 
both increase and decrease upwards in any snapshot of the thermal rift model, but 
it can only increase upwards in the REE melting model. The portion of the cumu- 
lative melt degree curve that decreases upward (Fig. 3f) represents mantle that 
melted previously but has now welled up and out of the melting region. Therefore 
the thermal and REE models are expected to be most closely comparable in the 
deepest part of the melting region, where the cumulative degree of melting in- 
creases upwards, and Supplementary Fig. 2b shows an excellent match in this 
region. The base of the lithosphere is specified as a single value in the REE model, 
while in the thermal model it is calculated as the zone over which progress of 
melting decays from a maximum to zero. Given these different definitions, it is 
encouraging that there is a difference of only 9 km between the top of the melting 
region in the best-fitting REE models (80 km) and the depth of maximum melting 
progress automatically determined by the thermal model (71 km) (Fig. 3d, f). 
Results can also be compared in terms of degrees of melting and proportions of 
total melting within the garnet and spinel stability zones, which determine the REE 
concentrations in the melt (Supplementary Fig. 2a). The models all involve the 
onset of melting at 93-96 km. At 86 km (top of garnet stability) the thermal rifting 
model has melted by 2.37%, and 37% of the total melt is generated in the garnet 
field. For the best REE forward (or inversion) model, 2.6% (or 2.9%) melting has 
occurred at the top of the garnet stability field and 48% (or 54%) of the total melt is 
generated in the garnet field. Therefore, the thermal rifting model and the forward 
and inverse REE models are comparable in terms of predicted REE concentrations. 
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The rich fossil record of equids has made them a model for evolution- 
ary processes’. Here we present a 1.12-times coverage draft genome 
from a horse bone recovered from permafrost dated to approximately 
560-780 thousand years before present (kyr BP)**. Our data represent 
the oldest full genome sequence determined so far by almost an order 
of magnitude. For comparison, we sequenced the genome of a Late 
Pleistocene horse (43 kyr BP), and modern genomes of five domestic 
horse breeds (Equus ferus caballus), a Przewalski’s horse (E. f. prze- 
walskii) and a donkey (E. asinus). Our analyses suggest that the Equus 
lineage giving rise to all contemporary horses, zebras and donkeys 
originated 4.0-4.5 million years before present (Myr BP), twice the 
conventionally accepted time to the most recent common ancestor 
of the genus Equus*’. We also find that horse population size fluctu- 
ated multiple times over the past 2 Myr, particularly during periods 
of severe climatic changes. We estimate that the Przewalski’s and 
domestic horse populations diverged 38-72 kyr Bp, and find no evid- 
ence of recent admixture between the domestic horse breeds and the 
Przewalski’s horse investigated. This supports the contention that 
Przewalski’s horses represent the last surviving wild horse population®. 
We find similar levels of genetic variation among Przewalski’s and 
domestic populations, indicating that the former are genetically viable 
and worthy of conservation efforts. We also find evidence for continu- 
ous selection on the immune system and olfaction throughout horse 
evolution. Finally, we identify 29 genomic regions among horse breeds 
that deviate from neutrality and show low levels of genetic variation 
compared to the Przewalski’s horse. Such regions could correspond to 
loci selected early during domestication. 

In 2003, we recovered a metapodial horse fossil at the Thistle Creek 
site in west-central Yukon Territory, Canada (Fig. 1a). The fossil was 


from an interglacial organic unit associated with the Gold Run volcanic 
ash, dated to 735 + 88 kyr Bp*” (Fig. 1b). Relict ice wedges below the 
unit indicate persistent permafrost since deposition (Supplementary 
Information, section 1.1), whereas the organic unit, hosting the fossil, 
indicates a period of permafrost degradation, or a thaw unconformity’, 
during a past interglacial as warm or warmer than present’, and rapid 
deposition during either marine isotope stage 19, 17 or 15. This indi- 
cates that the fossil dates to approximately 560-780 kyr Bp. The meta- 
podial shows typical caballine morphology, consistent with Middle 
rather than the smaller Late Pleistocene horse fossils from the area 
(Fig. 1c and Supplementary Information, section 1.2). This age is con- 
sistent with small mammal fossils from this unit indicating a Late 
Irvingtonian, or Middle Pleistocene, age’, and infinite radiocarbon 
dates’. 

Theoretical’ and empirical evidence’® indicates that this age appro- 
aches the upper limit of DNA survival. So far, no genome-wide informa- 
tion has been obtained from fossil remains older than 110-130 kyr Bp"’. 
Time-of-flight secondary ion mass spectrometry (TOF-SIMS) on the 
ancient horse bone revealed secondary ion signatures typical of collagen 
within the bone matrix (Fig. 2a and Supplementary Table 7.1), and high- 
resolution tandem mass spectrometry sequencing” revealed 73 proteins, 
including blood-derived peptides (Supplementary Information, section 
7.4). This is consistent with good biomolecular preservation, suggesting 
possible DNA survival. Therefore, we conducted larger-scale destructive 
sampling for genome sequencing. 

We used Illumina and Helicos sequencing to generate 12.2 billion 
DNA reads from the Thistle Creek metapodial. Mapping against the 
horse reference genome yielded ~1.12 genome coverage. We based 
the size distribution of ancient DNA templates on collapsed Illumina 
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Figure 1 | The early Middle Pleistocene horse metapodial from Thistle 
Creek (TC). a, Geographical localization. b, Stratigraphic setting. 

c, Morphological comparison to Middle and Late Pleistocene horses from 
Beringia. Simpson’s ratio diagrams contrasting log) differences in 10 
metapodial measurements between horse fossils and a reference (E. hemionus 
onager) are shown for a series of 9 and 30 horses from the Middle and the Late 
Pleistocene era, respectively (Supplementary Information, section 1.2). The full 


read pairs (Supplementary Fig. 4.4), yielding an average length of 
77.5 base pairs (bp). The specimen is male based on X to autosomal 
chromosome coverage (Supplementary Information, section 4.2b) and 
the presence of Y-chromosome markers (Supplementary Informa- 
tion, section 4.1d). Endogenous read content was lower for Illumina 
(0.47%) than Helicos (4.21%) using standard*® or improved”* single- 
strand template preparation procedures. This is probably due to 3’ ends 
available at nicks, resistance of undamaged modern DNA contamin- 
ants to denaturation, and Helicos ability to sequence short templates. 
Despite this, endogenous DNA content was >16.6-20.0-fold lower 
than for Saqqaq Palaeo-Eskimo'* and Denisovan specimens’, both 
sequenced to high depth. 

Several observations support genome sequence authenticity. First, a 
348-bp mitochondrial control region segment was replicated indepen- 
dently (Supplementary Fig. 2.2 and Supplementary Information, sec- 
tion 2.4). Second, phylogenetic analyses on data obtained with two 
sequencing platforms in different laboratories are consistent (Sup- 
plementary Fig. 8.4), ruling out post-purification contamination. Third, 
autosomal, Y-chromosomal and mitochondrial DNA analyses place the 
Thistle Creek specimen basal to Late Pleistocene and modern horses 
(Fig. 3a and Supplementary Figs 8.1-8.4). Fourth, we found signs of 
severe biomolecular degradation, including levels of cytosine deami- 
nation at overhangs considerably higher than observed in 28 younger 
permafrost-preserved fossils from the Late Pleistocene (Fig. 2c, Sup- 
plementary Fig. 6.40 and Supplementary Table 6.1) and protein deami- 
dation levels'*'® (Fig. 2b and Supplementary Information, section 7.5) 
greater than those reported for younger permafrost-preserved bones. 

Weadditionally sequenced genomes of a 43-kyr-old (pre-domestication) 
horse (1.8% coverage), a modern donkey (16; Supplementary Fig. 4.1), 
5 modern domestic horses (Arabian, Icelandic, Norwegian fjord, Stan- 
dardbred and Thoroughbred; 7.9X-21.1<) and one modern Prze- 
walski’s horse (9.6; Supplementary Table 2.1), considered to possibly 
represent the last surviving wild horse population. We used this data set 
to address fundamental questions in horse evolution: (1) the timing of 
the origins of the genus Equus; (2) the demographic history of modern 
horses; (3) the divergence time of horse populations forming the Prze- 
walski’s and domestic lineages; (4) the extent to which the Przewalski’s 
horse has remained isolated from domestic relatives; (5) the timing of 
gene expansions within the horse genome; (6) the identification of genes 
potentially under selection during horse evolution. 

As no accepted Equus fossils exist before 2.0 Myr BP*° (Supplementary 
Information, section 9.1d), the date of the last common ancestor that 
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gave rise to extant horses versus donkeys, asses and zebras” remains 
heavily debated. Proposed dates extend as early as 4.2-4.5 Myr BP on the 
basis of palaeontological estimates'* to over 6.0 Myr BP according to 
molecular analyses'’. We addressed this issue by taking advantage of 
the established age for the Thistle Creek horse. As a sample cannot be 
older than the population it belonged to, we explored a full range of 
possible calibrations for the Equus most recent common ancestor 
(MRCA) and calculated the divergence time between the populations 
of the ancient Thistle Creek horse and modern horses” (Supplementary 
Information, section 10.1). Calibrations resulting in divergence times 
younger than the Thistle Creek bone age were rejected, providing a 
credible confidence range for the MRCA of Equus. We found rates 
consistent with the Equus MRCA living 3.6-5.8 Myr BP to be compatible 
with our data (Fig. 3b and Supplementary Figs 10.1-10.3). We also found 
support for slower mutation rates in horse than human (Supplementary 
Information, section 8.4 and Supplementary Table 8.5), implying a mini- 
mal date of 4.07 Myr Bp for the MRCA of Equus (Supplementary 
Figs 10.1-10.3). We therefore propose 4.0-4.5 Myr Bp for the MRCA 
of all living Equus, in agreement with recent molecular findings’” 
and the oldest palaeontological records for the monodactyle Plesippus 
simplicidens, which some'* consider the earliest fossil of Equus. Our 
result indicates that the evolutionary timescale for the origin of contem- 
porary equid diversity is at least twice that commonly accepted. 
Second, we reconstructed horse population demography over the 
last 2 Myr. The pairwise sequential Markovian coalescent (PSMC) 
approach! shows that horses experienced a population minimum 
approximately 125 kyr Bp, corresponding to the last interglacial when 
environmental conditions were similar to now throughout their range. 
The population expanded during the cold stages of marine isotope 
stage (MIS) 4 and 3 as grasslands expanded. A peak was reached 
25-50 kyr Bp and was followed by an approximately 100-fold collapse, 
probably resulting from major climatic changes and related grassland 
contraction after the Last Glacial Maximum” (Fig. 4 and Supplemen- 
tary Figs 9.4-9.5). A similar demographic history was inferred from 
Bayesian skyline reconstructions using 23 newly characterized ancient 
mitochondrial genomes (Supplementary Fig. 9.6). These results sup- 
port suggestions” that climatic changes are major demographic dri- 
vers for horse populations. PSMC analyses also revealed two earlier 
demographic phases (Fig. 4b and Supplementary Figs 9.4-9.5), with 
population sizes peaking 190-260 kyr Bp and 1.2-1.6 Myr Bp, respect- 
ively, followed by 1.7-fold and 8.1-fold collapses. Extremely low popu- 
lation sizes were inferred approximately 500-800 kyr Bp, a time period 
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Figure 2 | Amino acid, protein and DNA preservation of the Thistle Creek 
horse bone. a, Amino acid signatures. Secondary ions, characteristic of five 
amino acids over- or under-represented in collagen, were detected by TOF- 
SIMS (Supplementary Information, section 7.1). The size of secondary ion 
maps is 500 X 500 jum? with a resolution of 256 X 256 pixels. b, Glutamine 
deamidation. The observed distribution of glutamine deamidation levels 
(Supplementary Information, section 7.5) is blue for the Thistle Creek 

(TC) horse bone and green for a 43-kyr-old Siberian mammoth bone. 


that covers the divergence time of the Thistle Creek and contemporary 
horse populations. This result may relate to population fragmentation 
when horses colonized Eurasia from America, in agreement with the 
earliest presence of horses in Eurasia 750 kyr Bp*. 

We next investigated whether Przewalski’s horse indeed represents 
the last survivor of wild horses. Native to the Mongolian steppes, this 
horse was listed as extinct in the wild (IUCN red list”*) but has been 
reassigned to endangered after successful conservation and reintro- 
duction. Using maximum likelihood phylogenetic analyses and topo- 
logical tests (Supplementary Information, sections 8.2-8.3), we found 
that the Przewalski’s horse genome falls outside a monophyletic group 
of domestic horses. The MRCA of Przewalski’s and domestic horse 
sequences dates to 341-431 kyr sp (Supplementary Table 8.3), a period 
consistent with previous estimates®. We estimated the divergence time 
between populations of Przewalski’s and domestic horses to approxi- 
mately 38-72 kyr Bp (Supplementary Tables 10.4-10.6). Our 43 kyr BP 
horse genome branched off before the Przewalski’s and domestic horse 
lineages diverged (Fig. 3a). This specimen belonged to a population that 
diverged from that leading to modern horses approximately 89-167 kyr BP 


76 | NATURE | VOL 499 | 4 JULY 2013 


 CoH,* 
8 
6 
A 
2 
0 —_—_— 


40 
30 


20 


| 
| 


DNA 


horse 


c, Post-mortem DNA damage. Maximum likelihood estimates of cytosine 
deamination at 5’ overhangs were estimated for 29 permafrost-preserved horse 
bones, including the Thistle Creek bone (Supplementary Information, section 
6.3). Mitochondrial and nuclear estimates are provided in red and blue, 
respectively. Calibrated radiocarbon dates (BC) are provided when available 
(Supplementary Tables 2.3-4). Error bars refer to 2.5% and 97.5% quantile 
values, estimated following convergence of the maximum likelihood procedure. 


(Supplementary Figs 10.1-10.3 and Supplementary Table 10.5), providing 
a maximal boundary for the younger divergence between Przewalski’s and 
domestic horses. 

Using quartet alignments and D statistics* (Supplementary Informa- 
tion, sections 12.1-12.3) we found no evidence for admixture between 
the Przewalski’s horse and the individual horse breeds investigated in 
this study using either the donkey or the ancient Thistle Creek genome as 
out-group (Supplementary Tables 12.1-S12.3). Scanning the Prze- 
walski’s horse genome, we also found no long tracts of shared poly- 
morphisms with domestic horses (Supplementary Fig. 12.3), as would 
be expected if recent admixture occurred after the last wild individual 
was captured in the 1940s”°. Rather, we identified long tracts of variation 
unique to the Przewalski’s horse genome, including genes involved in 
immunity, cytoskeleton, metabolism and the central nervous system 
that could have been specifically selected in this lineage (Supplemen- 
tary Information, section 12.6). The average levels of polymorphism 
present in the Przewalski’s horse genome are greater than those observed 
in the Icelandic, Standardbred and Arabian horse genomes (Supplemen- 
tary Fig. 5.5 and Supplementary Table 11.10). Thus, unadmixed lineages 
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are still present in the endangered Przewalski’s horse population, with 
levels of allelic diversity that can support long-term survival of captive 
breeding stocks despite descending from only 13-14 wild individuals”. 

The sequencing of the horse reference genome showed increased 
paralogous expansion rates in horses compared to humans and bovines 
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Figure 3 | Horse phylogenetic relationships and population divergence 
times. a, Maximum likelihood phylogenetic inference. We performed a super- 
matrix analysis of 5,359 coding genes (Supplementary Information, section 
8.3a, 100 bootstrap pseudo-replicates) and estimated the average age for the 
main nodes (r8s semi-parametric penalized likelihood (PL) method, 
Supplementary Information, section 8.3c; see Supplementary Table 8.3 for 
other analyses). Asterisk indicates previously published horse genomes. 

b, Population divergence times. We used ABC to recover a posterior 
distribution for the time when two horse populations split over a full range of 
possible mutation rate calibrations (Supplementary Information, section 10.1). 
The first population included the Thistle Creek horse; the second consisted of 
modern domestic horses. A conservative age range for the Thistle Creek horse is 
reported between the dashed lines (560-780 kyr). 


for certain functionally important gene families*® (Supplementary 
Information, section 5.1c). Our data set revealed that a limited fraction 
of horse paralogues (1.7%, representing 258 paralogues) showed no hits 
among donkey reads, suggesting that most horse paralogues expanded 
before the origin of the genus Equus some 4.0-4.5 Myr Bp. Among these 
258 regions, 11 L1 retrotransposons and one copy of a keratin gene are 
absent from the ancient Thistle Creek horse genome but present in the 
43 kyr horse and modern horses (Supplementary Table 5.3), suggesting 
an expansion before their MRCA some 500-626 kyr Bp (Supplementary 
Table 8.3). Similarly, 44 Ll-retrotransposon paralogues were found 
only in modern horse genomes (Supplementary Table 5.4), indicating 
that expansion of L1 retrotransposons has remained active since then. 

Finally, we identified loci potentially selected in modern horses (Sup- 
plementary Figs 11.1-11.2), focusing on regions showing unusual 
densities of derived mutations (Supplementary Information, section 
11.1). We caution that local variations in mutation and recombination 
rates, as well as misalignments, may result in similar signatures at 
neutrally evolving regions. Functional clustering analyses revealed 
significant enrichment for immunity-related and olfactory receptor 
genes (Supplementary Table 11.4), two categories also enriched for non- 
synonymous single nucleotide polymorphisms (SNPs) (Supplementary 
Information, section 5.2d). Additionally, we identified 29 regions show- 
ing deviation from neutrality and significant reduction in genetic diver- 
sity among modern domestic horses compared to Przewalski’s horse 
(Supplementary Tables 11.8-11.9). Such regions could correspond to 
loci that have been selected and transmitted to all horse breeds investi- 
gated here after divergence from the Przewalski’s horse population, 
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Figure 4 | Horse demographic history. a, Last 150 kyr Bp. PSMC based on 
nuclear data (100 bootstrap pseudo-replicates) and Bayesian skyline inference 
based on mitochondrial genomes (median, black; 2.5% and 97.5% quantiles, 
grey) are presented following the methodology described in Supplementary 
Information, section 9. The Last Glacial Maximum (19-26 kyr BP) is shown in 
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pink. b, Last 2 Myr Bp. PSMC profiles are scaled using the new calibration values 
proposed for the MRCA of all living members of the genus Equus (4.0 Myr, 
blue; 4.5 Myr, red), and assuming a generation time of 8 years (for other 
generation times, see Supplementary Figs 9.4 and 9.5). 
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possibly related to domestication. These regions include genes for the 
KIT ligand critical for haematopoiesis, spermatogenesis and melanogen- 
esis, and myopalladin involved in sarcomere organization. 

Our study has pushed the timeframe of palaeogenomics back by almost 
an order of magnitude. This enabled us to readdress a range of questions 
related to the evolution of Equus—a group representing textbook exam- 
ples of evolutionary processes. The Thistle Creek genome also provided us 
with direct estimates of the long-term rate of DNA decay”, revealing that a 
significant fraction (6.0-13.3%) of short (25-bp) DNA fragments may 
survive over a million years in the geosphere (Supplementary Fig. 6.42). 
Thus, procedures maximizing the retrieval of short, but still informative, 
DNA may provide access to resources previously considered to be much 
too old. Methods have recently been developed for increasing the sequen- 
cing depth of ancient genomes’* but do not increase the percentage of 
endogenous sequences retrieved. Overcoming this technical challenge 
with whole-genome enrichment approaches, and lower sequencing costs, 
will make retrieval of higher coverage genomes from specimens with low 
endogenous DNA content practical and economical. 


METHODS SUMMARY 


Ancient horse extracts and DNA libraries were prepared in facilities designed to 
analyse ancient DNA following standard procedures*'*. Protein sequencing was 
performed using nanoflow liquid chromatography tandem mass spectrometry”. 
DNA sequencing was performed using Illumina and Helicos sequencing platforms*”’. 
Reads were aligned to the horse reference genome” and de novo assembled donkey 
scaffolds using BWA”. Maximum-likelihood DNA damage rates were estimated 
from nucleotide misincorporation patterns. Population divergence times were 
estimated disregarding transitions to limit the impact of replication of damaged 
DNA and following ref. 20 with quartet genome alignments instead of trios and 
implementing approximate Bayesian computation (ABC). 


Online Content Any additional Methods, Extended Data display items and Source 
Data are available in the online version of the paper; references unique to these 
sections appear only in the online paper. 
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METHODS 


Genome sequencing. All fossil specimens were extracted in facilities designed to 
analyse ancient DNA using silica-based extraction procedures*”*' (Supplementary 
Information, section 2). A total number of 16 ancient horse extracts were built into 
Illumina libraries (Supplementary Information, section 2) and shotgun-sequenced 
at the Centre for GeoGenetics (Supplementary Tables 2.3 and 4.9). The full mito- 
chondrial genome of a total number of 16 ancient horse specimens was captured 
using MYselect in-solution target enrichment kit (Supplementary Information, 
section 3.3b) following library construction”, and sequenced at Penn State/UCSC 
(Supplementary Tables 2.4 and 4.10). The combination of shotgun sequencing 
and capture-based sequencing performed in those two laboratories resulted in the 
characterization of 23 novel pseudo-complete ancient horse mitochondrial gen- 
omes (Supplementary Table 8.1). Additional sequencing was compatible with the 
characterization of draft nuclear genomes of two ancient horse specimens (Sup- 
plementary Tables 4.9 and 4.11): that of a Middle Pleistocene horse from Thistle 
Creek (560-780 kyr BP), and that of a Late Pleistocene horse from the Taymyr 
Peninsula (CGG10022, cal. 42,012-40,0948Bc; Supplementary Table 2.3). The 
Thistle Creek horse draft genome was characterized using Illumina (11,593,288,435 
reads, Supplementary Table 3.2; coverage = 0.74%, Supplementary Table 4.11) and 
Helicos sequence data (654,292,583 reads, Supplementary Table 3.5; coverage = 0.38 
X, Supplementary Table 4.11). Ancient specimens were radiocarbon dated at Belfast 
14Chrono facilities (Supplementary Tables 2.3 and 2.4). The Middle Pleistocene 
Thistle Creek horse bone is associated with infinite radiocarbon dates. 

Modern equine genomes from five modern horse breeds (Arabian, Icelandic, 
Norwegian fjord, Standardbred, Thoroughbred), one Przewalski’s horse individual 
and one domestic donkey were characterized using Illumina paired-end sequencing 
(Supplementary Information, sections 3.1.b.3-3.1.b.4). DNA was extracted and 
prepared into libraries (Supplementary Information, section 2.2) in laboratories 
located in buildings physically separated from ancient DNA laboratory facilities. 
Modern horse genomes were sequenced at the Danish National High-Throughput 
DNA Sequencing Centre whereas the donkey genome was characterized at BGI, 
Shenzen (Supplementary Information, 3.1). Trimmed reads were aligned to the 
horse reference genome EquCab2.0 (ref. 26), excluding the mitochondrial genome 
and chrUn, using BWA” (Supplementary Information, section 4.2). We generated 
a draft de novo assembly of the donkey genome using de Bruijn graphs as imple- 
mented within SOAPdenovo”’ (Supplementary Information, section 4.1.a), built 
gene models using Augustus™* and SpyPhy* (Supplementary Information, section 
4.1.b), and identified candidate scaffolds originating from the X and Y chromo- 
somes (Supplementary Information, sections 4.1.c and 4.1.d). Sequence reads were 
also aligned against de novo assembled donkey scaffolds (Supplementary Inform- 
ation, section 4.2). For all genomes characterized in this study, we estimated that 
overall error rates were low (Supplementary Information, section 4.4.a), with 
type-specific error rates inferior to 5.3 X 10-4, except for ancient genomes where 
post-mortem DNA damage inflated the GC—AT mis-incorporation rates (Sup- 
plementary Table 4.12). Metagenomic assignment of all reads generated from the 
Thistle Creek horse bone was performed using BWA-sw” and mapping against a 
customized database, which included all bacterial, fungal and viral genomes avail- 
able (Supplementary Information, section 4.3). 

Genomic variation. SNPs were called for modern genomes using the mpileup 
command from SAMtools (0.1.18)*” and bcftools, and were subsequently filtered 
using vcfutils varFilter and stringent quality filter criteria (Supplementary 
Information, section 5.2). We compared overall SNP variation levels (Supplemen- 
tary Information, sections 5.2b and 11.2; Supplementary Table 11.10) present in 
modern horse genomes. We also compared genotypic information extracted from 
the genomes characterized in this study to that of 362 horse individuals belonging 
to 14 modern domestic breeds and 9 Przewalski’s horses**. Genotype and the 
breed/population of origin were converted into PLINK map and ped formats” 
and further analysed using the software Smartpca of EIGENSOFT 4.0 (ref. 40). 
PCA plots were generated using R 2.12.2 (ref. 41) (Supplementary Figs 5.6-5.14). 
Filtered SNPs that passed our quality criteria (Supplementary Information, section 
5.2.a) were categorized into a series of functional and structural genomic classes 
using the Perl script variant_effect_predictor.pl version 2.5 (ref. 42) available at 
Ensembl and the EquCab2.0 annotation database version 65 (Supplementary 
Information, section 5.2b). We also screened our genome data for a list of 36 loci 
that have been associated with known phenotypic defects and/or variants 
(Supplementary Information, section 5.2e and Supplementary Tables 5.19 and 
5.20). We systematically looked in the donkey genome for the presence of genes 
that have been identified in the horse reference genome as paralogues. This was 
performed by downloading from Ensembl a list of 15,310 paralogues and extract- 
ing genomic coordinates of the 15,171 paralogues that were located on the 31 
autosomes and the X chromosome. We next calculated the average depth-of- 
coverage of these regions using the alignment of donkey reads against the horse 
reference genome. A total number of 258 paralogues exhibited no hit and were 
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putatively missing from the donkey genome. We further tested for the presence of 
those paralogues in the different ancient horse genomes characterized here, using a 
model where observed depth-of-coverage in ancient individual (Illumina data) is a 
function of the depth-of-coverage observed in a modern horse male individual, 
local %GC and read length (Supplementary Information, section 5.1c). A similar 
model was used for identifying segmental duplications in modern equid genomes 
(Supplementary Information, section 5.1b). 

DNA damage. We estimated DNA damage levels in the Thistle Creek horse 
sample and compared these to the DNA damage levels observed among other 
Pleistocene horse fossil bones, all associated with more recent ages (Supplemen- 
tary Tables 2.3 and 2.4). All fossil specimens analysed were permafrost-preserved, 
limiting environmental-dependent variation in DNA damage rates*’. DNA frag- 
mentation and nucleotide mis-incorporation patterns were plotted using the 
mapDamage package** (Supplementary Information, section 6.2). We then 
developed a DNA damage likelihood model after the model presented in ref. 45, 
with slight modifications, where ancient DNA fragments consist of four non- 
overlapping regions from 5’ to 3’ ends: (1) a single-stranded overhang; (2) a 
double-stranded region that extends until a single-strand break is encountered; 
(3) a double-stranded region that extends 3’ of the single strand break previously 
mentioned, and; (4) a single stranded overhang (Supplementary Information, 
section 6.3 and Supplementary Fig. 6.39). All model parameters were estimated 
using maximum likelihood. Confidence intervals were found by taking each para- 
meter in turn and slowly adjusting that parameter while maximizing the likelihood 
with respect to all other parameters until finding the points above and below with 
likelihood 1.92 units below the maximum. Finally, we used the model framework 
presented in ref. 27 to recover direct estimates of DNA survival rates from next- 
generation sequence data (Supplementary Information, section 6.4). We restricted 
our analyses (1) to the distribution of templates showing sizes superior to the 
modal size category; and (2) to collapsed paired-end reads, as the size of the latter 
corresponds to the exact size of ancient DNA fragments inserted in the DNA 
library. 

Amino acid and proteomic analyses. A sample of the Middle Pleistocene Thistle 
Creek horse bone was embedded in Epothin resin under sterile conditions, cut and 
polished until chemical analysis of the sample surface could be performed with 
a time-of-flight secondary ion mass spectrometer (TOF-SIMS) instrument (Sup- 
plementary Information, section 7). We also performed high-resolution mass 
spectrometry (MS)-based shotgun proteomics analysis using two fragments from 
the Middle Pleistocene Thistle Creek horse bone (weighing 86 and 78 mg, respect- 
ively) in order to retrieve large-scale molecular information. The overall meth- 
odological approach follows the procedure that was previously applied to survey 
the remains of the bone proteome from three mammoth specimens living approx- 
imately 11-43 kyr ago’, although with significant improvements (Supplemen- 
tary Information, sections 7.2-7.3). Strict measures to avoid contamination and 
exclude false-positive results were implemented at every step, allowing to confi- 
dently profile 73 ancient bone proteins (from the attribution of 659 unique peptides 
based on 13,030 spectra). Raw spectrum files were searched on a local workstation 
using the MaxQuant algorithm version 1.2.2.5 (ref. 46) and the Andromeda peptide 
search engine” against the target/reverse list of horse proteins available from 
Ensembl (EqCab2.64.pep.all), the IPI v.3.37 human protein database and the com- 
mon contaminants such as wool keratins and porcine trypsin, downloaded from 
Uniprot. The spectra were also searched against the Uniprot protein database, 
taxonomically restricted to chordates, and non-horse peptides were identified 
and eventually removed. Proteomic data were further compared to similar informa- 
tion already generated from fossil specimens collected in Siberian permafrost and 
temperate environments. Proteome-wide incidence of deamidation was estimated 
in relation with protein recovery to further assess the molecular state of preservation 
of ancient proteins. 

Phylogenetic analyses. The CDS of protein-coding genes were selected from the 
Ensembl website, keeping the transcripts with the most exons in cases where 
multiple records were found for a single gene. We then extracted corresponding 
genomic coordinates, filtered for DNA damage/sequencing errors, and aligned 
each gene using MAFFT G-INS-i (‘ginsi’)**“” (Supplementary Information, sec- 
tion 8.3a). Phylogenetic analysis was carried out using a super-matrix approach. 
First RAxML v7.3.2” was run to generate the parsimony starting trees. The final 
tree inference was performed using RAxML-Light v1.1.1°' and one GTRGAMMA 
model of nucleotide substitutions for each gene partition (codon positions 1 and 2, 
versus 3). Node support was estimated using 100 bootstrap pseudo-replicates. 
Bootstrap trees were dated using ‘r8s’, using the PL method and the Truncated 
Newton (TN) algorithm, with a smoothing value of 1,000 (ref. 52), or using the 
Langley—Fitch (LF) method (Supplementary Information, section 8.3.c). The date 
of the root node was constrained to 4.0-4.5 Myr, the date of CGG10022 was fixed 
to 43 kyr, and the date of the Thistle Creek specimen was constrained to 560- 
780 kyr Bp. We also performed phylogenetic analyses of whole mitochondrial 
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genomes (Supplementary Information, section 8.1), Y chromosome (Supplemen- 
tary Information, section 8.2) and a series of topological tests using approximately 
unbiased tests as implemented in the CONSEL makermt program”’ (Supplemen- 
tary Information, section 8.3b). 

Demographic reconstructions. Past population demographic changes were 
reconstructed from whole diploid genome information using the pairwise sequen- 
tially Markovian coalescent model (PSMC)”' and excluding sequence data origin- 
ating from sex chromosomes and scaffolds (Supplementary Information, section 9). 
For low coverage genomes (<20X), we applied a correction based on an empirical 
uniform false-negative rate. Three different generation times of 5, 8 and 12 years 
were considered in agreement with the range of generation times reported in the 
literature*****°. Mutation rates were estimated using quartet genome alignments 
where the donkey was used as out-group (Supplementary Information, section 
10.1c). We also reconstructed past horse population demographic changes by 
means of Bayesian skyline plots using the software BEAST v1.7.2 (refs 57, 58) 
(Supplementary Information, section 9.2). Complete mitochondrial genomes were 
aligned and partitioned as described in Supplementary Information, section 8.1b, 
and a strict clock model was selected. We ran two independent MCMC chains of 50 
million iterations each, sampling from the posterior every 5,000 iterations. We 
discarded the first 10% of each chain as burn-in, and after visual inspection in 
Tracer v1.5 to ensure that the replicate chains had converged on similar values, 
combined the remainder of the two runs. 

Population split. We followed the method presented in ref. 20 to estimate the 
population divergence date of ancient and modern horses (Supplementary 
Information, section 10.1). This method was also applied to date the population 
divergence of Przewalski’s horses and domestic horses (Supplementary Informa- 
tion, section 10.2), as both our phylogenetic analyses and admixture tests supported 
those as two independent populations (Supplementary Information, sections 8.3 
and 12). In this method, we focus on heterozygous sites in one of the two popula- 
tions and randomly sample one of the two possible alleles (ancestral or derived) in 
the individual belonging to the first population. The number of times a derived allele 
is sampled (F statistics) can be used to recover a full posterior distribution of the 
population divergence time using (serial) coalescent simulations and approximate 
Bayesian computation (ABC) (Supplementary Information, section 10.1). For dat- 
ing the divergence time between the Przewalski’s horse population and domestic 
breeds, we also performed coalescent simulations using ms° assuming different 
divergence times in order to compute the expected relative occurrences of 4 geno- 
type configurations (Supplementary Information, section 10.2b). We assumed that 
no gene flow occurred after the population split, in agreement with the absence of 
detectable levels of admixture. The divergence time was then estimated by mini- 
mizing the root mean square deviation (r.m.s.d.) between observed and expected 
genotype configurations. We minimized the r.m.s.d. using a golden search algo- 
rithm. We repeated the minimization from different starting values to ensure 
convergence. 

Selection scans. We used quartet alignments including the donkey as out-group, 
one ancient horse and two modern horses to scan for genomic regions where the 
two modern horses shared unusual accumulation of derived alleles (Supplemen- 
tary Information, section 11.1). We used a sliding window approach on the entire 
genome, with a window size of 200kb and calculated an unbiased proxy for 
selection using the ‘delta technique’ (see for example ref. 61). We then used an 
outlier approach to identify candidate loci with a conservative false-positive rate of 
0.01. We further retrieved transcript IDs from the different genomic regions 
identified and performed functional clustering analyses in DAVID”. We esti- 
mated genetic diversity (theta Watterson) within the Przewalski’s horse popu- 
lation and among modern horse breeds using sliding windows of 50 kb. For 
this, we estimated the population scaled mutation rate and used an empirical 
Bayes method where we took the uncertainty of the data into account by using 
genotype likelihoods instead of calling genotypes. We computed the genotype like- 
lihoods assuming a model similar to that of SAMtools version 0.1.18 (ref. 37) 
(Supplementary Information, section 11.2). Genomic windows showing excessive 
proportions of segregating sites with regards to species divergence (>5%) or cov- 
erage <90% were discarded. We estimated Tajima’s D following the same proce- 
dure and identified genomic regions showing minimal Tajima’s D values and low 
genetic diversity among breeds but not in the Przewalski’s horse population as a 
conservative set of gene candidates for positive selection among modern horse 
breeds. Finally, we scanned modern horse genomes for long homozygosity tracts, 
which could be indicative of selective sweeps®’. We used 2-Mb sliding windows and 
ignored sites showing coverage inferior to 8. This resulted in the identification of 
456 outlier regions within 8 modern horse genomes. 

Admixture analyses. In order to investigate if there was evidence for gene flow 
between the Przewalski’s horse population and four modern horse domestic 
breeds (Arabian, Icelandic, Norwegian fjord and Standardbred), we performed 
ABBA-BABA tests””*. To avoid introducing bias due to differences in sequencing 


depth we based the tests on data achieved by sampling one allele randomly from 
each horse at each site. First we used the domestic donkey as out-group, then the 
Middle Pleistocene Thistle Creek horse. When using the Thistle Creek horse as 
out-group we removed all sites showing transitions to avoid spurious patterns 
resulting from nucleotide misincorporations related to post-mortem DNA 
damage. We estimated the standard error of the test statistic using ‘delete-m 
Jackknife for unequal m’ with 10-Mb blocks** (Supplementary Information, sec- 
tion 12.1). We also scanned genome alignments to record the proportion of shared 
SNPs between Przewalski’s horse and each horse breed (Supplementary Informa- 
tion, section 12.6), a proxy for recent admixture events that are expected to result in 
the introgression of alleles from the admixer to the admixed genome and long 
tracts of shared polymorphisms. Finally, we compared our Przewalski’s horse 
individual to other individuals with different levels of admixture in their pedigree. 
We extracted genotype information from the Przewalski’s horse genome for SNP 
coordinates already genotyped across 9 Przewalski horse individuals**. Genotypic 
information from two Mongolian horses was added as out-group. We next selected 
the best model of nucleotide substitution using modellgenerator v0.85 (ref. 65) and 
performed maximum likelihood phylogenetic analyses using PhyML 3.0 (ref. 66) 
(Supplementary Information, section 12.5). We further confirmed the phylogen- 
etic position of our Przewalski’s horse individual together with Rosa (KB3838), 
Basil (KB7413) and Roland (KB3063), three individuals for which no admixture with 
domestic horses could be detected in previous studies* by means of Approximate- 
Unbiased (AU) and Shimodeira-Hasegawa (SH-) tests, as implemented in CONSEL”. 
Morphological analyses. We measured the metapodial of Thistle Creek early 
Middle Pleistocene bone for 6 dimensions, despite incomplete preservation of 
its distal end (Supplementary Information, section 1.2). These measurements were 
compared to 30 metatarsals of E. lambei, 9 metatarsals of E. cf. scotti of Klondike, 
Central Yukon, Canada (Supplementary Information, section 1.2) and to extant 
horses (Supplementary Information, section 1.3). Comparisons were made using 
Simpson’s ratio diagrams that provide a standard and accurate comparison of both 
size and shape, for a single bone or a group of bones (Supplementary Figs 1.2 and 
1.3). We also measured taxonomically informative morphometric features on the 
skull and post-cranial complete skeleton of the modern Przewalski’s horse spe- 
cimen that was genome sequenced. We compared those to a collection of horse 
measurements available for horses, filtering for specimens of similar age and using 
principal component analyses (Supplementary Information, section 1.4). 
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Variation and genetic control of protein abundance in 


humans 


Linfeng Wu, Sophie I. Candille!*, Yoonha Choi!, Dan Xie!, Lihua Jiang’, Jennifer Li-Pook-Than', Hua Tang! & Michael Snyder! 


Gene expression differs among individuals and populations and is 
thought to be a major determinant of phenotypic variation. Although 
variation and genetic loci responsible for RNA expression levels have 
been analysed extensively in human populations’~*, our knowledge is 
limited regarding the differences in human protein abundance and 
the genetic basis for this difference. Variation in messenger RNA 
expression is not a perfect surrogate for protein expression because 
the latter is influenced by an array of post-transcriptional regulatory 
mechanisms, and, empirically, the correlation between protein and 
mRNA levels is generally modest®’. Here we used isobaric tag-based 
quantitative mass spectrometry to determine relative protein levels of 
5,953 genes in lymphoblastoid cell lines from 95 diverse individuals 
genotyped in the HapMap Project*’. We found that protein levels are 
heritable molecular phenotypes that exhibit considerable variation 
between individuals, populations and sexes. Levels of specific sets of 
proteins involved in the same biological process covary among indi- 
viduals, indicating that these processes are tightly regulated at the 
protein level. We identified cis-pQTLs (protein quantitative trait 
loci), including variants not detected by previous transcriptome 
studies. This study demonstrates the feasibility of high-throughput 
human proteome quantification that, when integrated with DNA 
variation and transcriptome information, adds a new dimension to 
the characterization of gene expression regulation. 


We used isobaric tandem mass tag (TMT)-based quantitative mass 
spectrometry to determine protein expression variation of lympho- 
blastoid cell lines (LCLs) derived from 95 ethnically diverse individuals 
genotyped in the HapMap Consortium. The samples consisted of 53 
Caucasians of northern and western European ancestry (CEU); 33 
Yorubans of African ancestry from Ibadan, Nigeria (YRI); eight Han 
Chinese from Beijing (CHB); and one Japanese from Tokyo (JPT). 
CHB and JPT were grouped together as East Asians (ASN). The 
ASN individuals were unrelated whereas the CEU and YRI groups 
included trios (mother, father and offspring), and had 42 and 23 unre- 
lated individuals, respectively. In each experiment, we used unique 
TMT tags to label trypsin-digested peptides from six cell lines, includ- 
ing a reference cell line (GM12878) and five other cell lines followed by 
two-dimensional liquid chromatography tandem mass spectrometry 
(2D LC-MS/MS) analysis (Fig. 1a). 

Fifty-one experiments were performed that included biological 
replicates; each resulted in an average of 54,000 high-confidence pep- 
tide identifications and quantifications. Protein expression in a cell 
line was quantified relative to the reference cell line, using peptides 
that uniquely mapped to a gene and lacked any known polymorphic 
protein coding variant among the 95 individuals (Supplementary 
Methods). A total of 5,953 proteins were quantified based on the 
analysis of 2,159,989 peptide spectra (Supplementary Table 1). To 


Figure 1 | Overview of workflow 
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ensure adequate sample size and statistical power, most of the analyses 
described below focused on the 4,053 proteins that were detected in 
more than 50% of the 74 unrelated individuals. 

To assess reproducibility, we analysed the correlation of protein 
level measurements between replicate and non-replicate cell lines. 
We observed that the Spearman’s rank correlation coefficient among 
non-replicates was much less than that of biological replicates, with 
median values 0.19 versus 0.56 (Supplementary Fig. 1a), suggesting 
that the TMT-based quantitative mass spectrometry technique can 
reproducibly detect variation in protein expression across individuals. 

We observed considerable inter-individual protein variation: a med- 
ian of 5.7% of the proteome changed more than 1.5-fold between pairs 
of individuals (Supplementary Fig. 1b). This figure is probably an 
underestimate because of precursor ion interference'”''. Although the 
CEU, YRI and ASN HapMap cell lines were established in separate 
batches and differ in age, the coefficients of variation (CV) estimated 
in the different populations are highly correlated (Spearman’s rank 
correlation coefficients 0.68-0.82, Supplementary Fig. lc and Sup- 
plementary Table 2), indicating that the level of inter-individual protein 
variation is similar across populations; therefore the observed pattern of 
protein variation is unlikely to be dominated by these exogenous factors. 
Furthermore, by estimation of potential peptide phosphorylation, we 
found little evidence that the measurements of protein variation were 
influenced by post-translational modification (Supplementary Fig. 2). 

To characterize the most and least variable proteins, we performed 
Gene Ontology (GO) category analysis and found that the most variable 
proteins were enriched in immune response, whereas the least variable 
proteins were enriched in housekeeping processes (Supplemen- 
tary Fig. 3). These findings are similar to those observed in previous 
mRNA studies'*. However, caution should be taken when comparing 
variability between proteins, because peptide ratios measured by iso- 
baric tag-based mass spectrometry can be distorted during precursor 
ion isolation’®"'. Because precursor ion interference mostly compresses 
the peptide ratio towards one, the underlying variation in some protein 
expressions may be substantially underestimated. Nonetheless, our 
results demonstrate a considerable variation in protein levels, particu- 
larly in immune response proteins. 

As a proof of principle demonstrating that the protein measure- 
ments reflect biological variation, we sought to detect protein variation 
associated with biological attributes such as sex and ethnicity. To avoid 
the correlation between parents and offspring, we only used unrelated 
individuals for the analyses below, with the exception of the heritability 
calculations, which were based on the trios. 

To identify proteins differentially expressed between males (n = 36) 
and females (n = 38), we regressed protein levels on sex, adjusting for 
average population differences (Supplementary Table 3). The distri- 
bution of P values for proteins exhibiting sex differences shows a 
modest enrichment at small P values (Supplementary Fig. 4a). At a 
false discovery rate (FDR) of 10%, 12 proteins are differentially 
expressed between sexes, among which seven have a Bonferroni cor- 
rected P value <0.05 and all seven map to the X or the Y chromosome 
(Supplementary Fig. 4b). These results indicate that our study captures 
bona fide variation in protein expression. 

Similarly, we examined population differences in protein expres- 
sion. We focused on the CEU and YRI unrelated individuals (42 CEU 
versus 23 YRI), as the ASN sample size was smaller. At an FDR of 10%, 
247 proteins are differentially expressed between CEU and YRI 
(Supplementary Table 4). The distribution of P values for population 
differences shows a much greater enrichment of small P values than for 
sex differences, and they are distributed throughout the genome 
(Fig. 1b, c). This finding further corroborates that our study can detect 
meaningful biological differences in protein expression. 

Proteins that are part of the same complex or in the same biological 
process might be expected to vary synchronously, indicative of a coordi- 
nated regulation of biological components and pathways. To determine 
if this is the case and to identify proteins that exhibit covariation, we 
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constructed protein covariation networks using sparse partial correlation 
estimation’*. In a sparse network, which connects proteins showing the 
strongest evidence of direct correlation (Supplementary Methods), 223 
edges connect 278 proteins; these include five major clusters, each with at 
least 9 proteins (that is, nodes) (Fig. 2 and Supplementary Table 5). We 
performed GO category analysis for the five clusters; three were enriched 
in protein metabolic process (P = 4 X 10‘), translation (P = 2 X 107”) 
and glycolysis (P = 2 X 10"), respectively. We also found many smaller 
clusters that consisted of subunits of protein complexes, for example, 
minichromosome maintenance complex components. Many of these 
edges connect known interacting proteins. Enrichment analysis showed 
that the known interacting proteins are significantly enriched in the 
protein covariation network (P=5 X 10°). Relaxing the stringency 
of direct correlation while maintaining high statistical confidence, 
assessed by permutation and sub-sampling analyses (Supplementary 
Methods), yielded a denser network with 1,012 edges connecting 944 
proteins, featuring a ‘megacluster’ of proteins that is enriched in trans- 
lation (P=2 X 10 °) (Supplementary Table 6). These results demon- 
strated that protein expression in a cell is highly coordinated and that, for 
several important biological processes (for example, translation and gly- 
colysis), tight control of protein levels is maintained. 

We also investigated the correspondence between protein-protein 
covariation and RNA-RNA covariation obtained by RNA sequencing 
(RNA-seq) in CEU and YRI LCLs**. We observed that covarying 
proteins tend to correspond to covarying RNAs with median correla- 
tion 0.42 for CEU and 0.21 for YRI (Supplementary Fig. 5). However, 
protein and RNA do not correlate perfectly, indicating that variation in 
protein levels is not entirely regulated through RNA expression. 
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Figure 2 | Protein covariation network generated by sparse partial 
correlation estimation. Nodes represent proteins. Edges represent connection 
by covariation. This sparse network displays the 223 strongest connections 
among 278 proteins. Protein function is annotated by node colour. Edge colour 
is categorized according to correlation value. Known protein-protein 
interacting pairs are highlighted in larger nodes and labelled with gene names. 
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To assess the extent and nature of the genetic factors that affect 
protein levels, we estimated the ‘narrow-sense’ heritability of protein 
levels, which represents the additive genetic component of protein 
levels and is estimated as the slope of regressing the offspring trait 
values on the average trait values of their parents. Median heritability 
of protein levels was 0.06 and 0.17 in CEU and YRI, respectively; 38% 
of the CEU proteins and 47% of the YRI proteins had a heritabi- 
lity higher than 0.2, respectively (Supplementary Fig. 6 and Sup- 
plementary Table 7). Overall, proteins in YRI cell lines show greater 
heritability than in CEU cell lines. Previous analyses on RNA level 
heritability have shown a similar trend', which may be attributable 
to the newer age of the YRI cell lines relative to the CEU cell lines. 

We also tested the association of cis genetic variation with protein 
levels using HapMap phase III genotypes’. We limited the search for 
protein quantitative trait loci (pQTLs) to those single nucleotide poly- 
morphisms (SNPs) located between +20 kilobases (kb) of the gene 
region with minor allele frequency (MAF) >10% in our samples. 
We performed a cis-pQTL analysis separately in CEU and YRI, and 
in CEU, YRI and ASN combined, in an effort to reveal pQTLs com- 
mon to all populations. Multiple loci throughout the genome displayed 
an excess of small P values (Fig. 3a and Supplementary Fig. 7a). At a 
10% FDR threshold, we detected 33, 13 and 77 genes with at least one 
significant pQTL in CEU, YRI and in all three populations combined, 
respectively (Table 1 and Supplementary Table 8). Of the 77 genes with 
a pQTL in the analysis combining all three populations, 34 were also 
identified in the CEU and/or YRI population. Indeed, the CEU pQTLs 
are highly enriched for significant P values and tend to have consistent 
regression coefficients or effect sizes in YRI (Supplementary Fig. 7b, c). 
These results suggest that there is a considerable overlap in the genetic 
architecture of protein expression across populations. The lower num- 
ber of significant pQTLs detected in YRI is probably a consequence of 
the smaller sample size. 

To what extent do the genetic determinants that affect RNA levels 
coincide with those that regulate protein levels? To address this ques- 
tion the genetic regions that affect protein expression (pQTLs) were 
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compared with those that affect RNA expression (eQTLs) previously 
identified in HapMap individuals using RNA-seq methods’. For each 
pQTL SNP, we obtained the P value for its association with RNA 
expression in CEU and YRI. Overall, we observed enrichment for small 
P values (Supplementary Fig. 8 and Supplementary Table 8), and we 
estimate that approximately one-half of pQTLs are probably also 
eQTLs. However, many pQTLs do not correspond to eQTLs, even at 
a relaxed statistical stringency. We note that the numbers of pQTLs 
detected in this study are relatively small due to the limited sample size. 
Therefore, the proportions of genetic variants contributing to both 
protein and mRNA variation and specific to protein variation should 
be considered as approximations. Nonetheless, our results indicate 
that despite an overlap between eQTLs and pQTLs, many pQTLs 
are distinct from eQTLs. 

Manual inspection of the individual pQTLs revealed interesting 
variants in several cases. OAS1 (2'-5’-oligoadenylate synthase 1) is 
an essential protein involved in the innate immune response to viral 
infection. Mutations in OASI have been associated with susceptibility 
to viral infection’. We identified a pQTL for OAS1. The variant show- 
ing the strongest correlation with OAS1 protein level is located at a 
splice site (rs10774671), where the G allele is associated with higher 
protein level than the A allele. OAS1 protein levels were calculated 
based on the quantification of 14 unique peptides, all of which are 
located before the splice site variant. Nine of them are shared by all 
known OAS1 isoforms in the literature. All of the used peptides have 
the same expression orientation at rs10774671, indicating that this 
SNP is associated with total protein level variation (Supplementary 
Fig. 9). The G allele at rs10774671 has previously been associated with 
higher enzyme activity but the underlying mechanism is unknown”. 
Our data indicate that this variant may influence the overall OAS1 
protein expression, in addition to giving rise to different isoforms. 

A second example, IMPA1 (inositol monophosphatase 1), is a 
putative target for lithium in the treatment of bipolar disorder’’, but 
no IMPA1 genetic variant has been associated with bipolar disease!’, 
nor has an eQTL been identified for this gene in recent RNA-seq 
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Table 1 | Number of cis-pQTLs at different FDR 


Group Number of LCLs Number of proteins Number of tests Number of genes with a pQTL 
10% FDR 20% FDR 30% FDR 
CEU Al 3,984 116,556 33 54 122 
YRI 22 4,017 121,405 is 34 50 
Three populations* 72 4,021 130,505 77 134 239 


* CEU, YRI and ASN populations combined. 


studies**. We found that SNP rs1058401, located at the 3’ untranslated 
region (UTR) of the IMPA1 gene, is associated with protein levels. We 
first explored a fine cis-pQTL mapping of the LMPAI gene using 
denser SNP coverage. We selected all the SNPs within +200 kb of 
the IMPA1 gene from HapMap phases I, II and HI with a MAF 
>5%. Several SNPs on or near the 3’ UTR show significant pQTL 
effect in CEU and in the three populations combined (Fig. 3b). We 
validated this pQTL by immunoblot analyses in both CEU and YRI 
(Fig. 3c, d and Supplementary Fig. 10). The results are consistent with 
the data obtained using mass spectrometry, confirming that rs1058401 
is indeed associated with IMPA1 protein levels. 

We also evaluated the correlation between IMPA1 protein and 
mRNA levels, and observed a poor correlation between protein and 
mRNA in the combined sample (r = 0.04, P = 0.76, Supplementary 
Fig. 11) or in CEU alone (r = —0.19, P = 0.27). However, protein and 
mRNA levels do show moderate correlation in YRI (r=0.50, 
P=0.02). The rs1058401 SNP showed no evidence of association with 
RNA levels measured in CEU (P = 0.56), moderate evidence of asso- 
ciation with RNA levels in YRI (P = 0.008), and much stronger evid- 
ence of association with protein levels (P = 3 X 10 7, in the combined 
populations analysis). We checked whether this SNP is associated with 
mRNA decay rate using results from a recent report’*, and found no 
support for such a hypothesis. Therefore, this pQTL may have a sig- 
nificant role in regulating gene expression at the translational level. 

We describe the first systematic interrogation of the genetic effects 
on the human proteome using isobaric tag-based quantitative mass 
spectrometry. Our results demonstrate the power of quantitative mass 
spectrometry data for analysis of protein co-regulation and uncovering 
genetic effects influencing protein abundance. With a larger number of 
cell lines and improvement of mass spectrometry technology, the 
number of pQTLs is likely to increase substantially. Some, but not 
all, pQTLs overlap with those identified in eQTL studies. These results 
indicate that distinct and diverse genetic mechanisms control gene 
expression at many different levels, suggesting that important and 
complementary knowledge can be acquired by systematically char- 
acterizing the human proteome. 


METHODS SUMMARY 


Lymphoblastoid cell lines (LCLs) from 95 HapMap individuals were obtained 
from the Coriell Institute for Medical Research. All trypsin-digest mixtures 
were analysed on an LTQ Orbitrap Velos (Thermo Scientific) equipped with an 
online 2D nanoACQUITY UPLC System (Waters) as previously described, with 
modifications'’. The acquired mass spectrometry raw data were searched against a 
human International Protein Index (IPI) database, version 3.75*°, concatenated 
with a decoy database with all the protein sequences in reverse order, using 
SEQUEST algorithm”! (Proteome Discoverer software, version 1.2, Thermo 
Scientific). The correspondence between proteins, genes (Ensembl gene IDs) 
and genomic loci was established based on the protein and gene cross-reference 
tables of IPI database version 3.87 and transcript sequences of Ensembl database 
release 62. Screening of peptides overlapping with protein coding changes was 
based on genotypes and annotation releases by the HapMap and 1000 Genomes 
Project’”***. To estimate the false discovery rate for sex, population and pQTL 
analyses, the QVALUE Bioconductor package was used**. For full methods, see 
Supplementary Information. 
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A single pair of interneurons commands the 
Drosophila feeding motor program 


Thomas F. Flood'*, Shinya Iguchi'*, Michael Gorczyca'*, Benjamin White’, Kei Ito® & Motojiro Yoshihara’ 


Many feeding behaviours are the result of stereotyped, organized 
sequences of motor patterns. These patterns have been the subject 
of neuroethological studies’”, such as electrophysiological char- 
acterization of neurons governing prey capture in toads’. However, 
technical limitations have prevented detailed study of the functional 
role of these neurons, a common problem for vertebrate organisms. 
Complexities involved in studies of whole-animal behaviour can be 
resolved in Drosophila, in which remote activation of brain cells by 
genetic means* enables us to examine the nervous system in freely 
moving animals to identify neurons that govern a specific behaviour, 
and then to repeatedly target and manipulate these neurons to 
characterize their function. Here we show neurons that generate 
the feeding motor program in Drosophila. We carried out an 
unbiased screen using remote neuronal activation and identified a 
critical pair of brain cells that induces the entire feeding sequence 
when activated. These ‘feeding neurons’ (here abbreviated to Fdg 
neurons for brevity) are also essential for normal feeding as their 
suppression or ablation eliminates sugar-induced feeding behaviour. 
Activation ofa single Fdg neuron induces asymmetric feeding behaviour 
and ablation of a single Fdg neuron distorts the sugar-induced 
feeding behaviour to become asymmetric, indicating the direct role 
of these neurons in shaping motor-program execution. Furthermore, 
recording neuronal activity and calcium imaging simultaneously 
during feeding behaviour’ reveals that the Fdg neurons respond 
to food presentation, but only in starved flies. Our results dem- 
onstrate that Fdg neurons operate firmly within the sensorimotor 
watershed, downstream of sensory and metabolic cues and at the top 
of the feeding motor hierarchy, to execute the decision to feed. 

To identify neurons controlling feeding behaviour we have behaviou- 
rally screened flies in which randomly targeted neurons are activated to 
induce the feeding motor program in a small, temperature-controlled 
chamber (Supplementary Fig. 2). To genetically target random sets of 
neurons, we took advantage of the collection of NP (Nippon) lines®. 
Each of these lines expresses the yeast transcription factor Gal4 in a 
different stereotyped pattern of neurons that depends on the GAL4 
insertion site’. Gal4-expressing cells were activated by mating flies of 
each line to flies with a transgene encoding a rat cold-activated cation 
channel, TRPM8 (ref. 8), or a Drosophila heat-activated-channel, 
TRPAI (ref. 9), under the control of upstream activating sequences 
(UAS) recognized by Gal4. A screen of 835 NP lines identified the GAL4 
line NP883, which showed continuous feeding behaviour with TrpA1 
at increased temperature. The induced behaviour was compared with 
natural feeding behaviour’ (Fig. la-d, Supplementary Figs 3, 4 and 
Supplementary Videos 1, 2). The natural feeding pattern, evoked by 
contact with food, is characterized by an initial cessation of locomotion 
followed by the sequential execution of eight basic motor patterns 
(Fig. 1a) for taking up food by repeated proboscis extension/retraction 
and opening/closing labellar lobes at the tip of the proboscis. This 
behavioural sequence was reproduced in a food-free environment by 
TRPA1-mediated activation of neurons in the Gal4-expressing pattern 


(Fig. 1b and Supplementary Video 2). The TRPA1-induced sequence 
was well-coordinated and indistinguishable from natural feeding behaviour 
in the duration of proboscis extension, labellar contact with the substrate 
and proboscis retraction (Fig. 1d). We observed repeated labellar open- 
ing even with the rostrum and haustellum immobilized (Supplementary 
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Figure 1 | Thermogenetic activation reproduced coordinated natural 
feeding behaviour. a, Natural feeding behaviour of a starved wild-type (WT) 
fly on normal food at 21 °C, consisting of eight basic motor patterns: (1) all 
main joints of the fly’s forelegs (black arrowheads) bend to bring the head closer 
to the food (dashed horizontal lines for comparison of head heights); (2) the 
rostrum (magenta arrowheads) projects forward (al to a3) while (3) the 
haustellum (blue arrowheads) extends downward (al to a3), resulting in 
protrusion of the proboscis; (4) the paired lobes at the tip of the proboscis, called 
labella (green arrowheads), open upon touching the food to take up food (a2 to 
a3); (5) taking food, the labella close (a3 to a4) and (6) the rostrum and (7) the 
haustellum retract, returning the entire proboscis to its original position while 
(8) the forelegs (black arrowheads) raise the body to its original position (a3 to 
a5). b, TRPA1-induced proboscis extension in a satiated NP883 > TrpA1 fly at 
31 °C with the eight basic motor patterns indistinguishable from a. No food was 
present. c, Schematic drawings depict unfolding sequence of major segments of 
proboscis. d, Comparison of time taken for each step in proboscis extension in 
c. n = 22 for each genotype. e, Proboscis-extension rate of free-running, 
satiated flies observed singly in an arena without food at 31 °C for each 
genotype (see Methods for description of the other line, NP5137, with a similar 
expression pattern to NP883 (Supplementary Fig. 8c)). Magenta bars denote 
mean values. ***P < 0.001 (see Methods for statistics). 1 = 40 for each 
genotype. f, Temperature dependence of proboscis-extension rate without food 
for free-running, satiated flies for each genotype. n = 40 for each genotype at 
each temperature. Error bars in all figures are s.e.m. 
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Fig. 5 and Supplementary Video 2), demonstrating independence of 
the induced behavioural sequence from sensory cues, which have been 
thought to be required for labellar lobe opening after contact with 
food*"®. Although flies in which the feeding program was induced by 
TRPAI stimulation generally opened their labella upon touching the 
plastic/glass substrate of the chamber (Fig. 1b b2-b3, Supplementary 
Figs 3b, 4b), some did so without touching the substrate (Supplemen- 
tary Fig. 4c, d and Supplementary Video 2). The induced feeding thus 
represents a ‘fixed action pattern’"’, which is completely executed with- 
out food or substrate, although the natural feeding must be coordi- 
nated with sensory stimuli as well. To quantify the feeding behaviour 
induced by stimulation of neurons in the NP883 pattern we adopted 
the extension and retraction of the rostrum and haustellum (numbers 
2, 3, 6 and 7 in the sequence outlined in the legend of Fig. 1a), referred 
to as ‘proboscis extension’. Measured by this index, the induced feeding 
behaviour observed in NP883 > TrpA1 flies at increased temperature 
required both the NP883-GAL4 insertion and the TrpA1 transgene 
(Fig. le) and exhibited a temperature dependence consistent with 
the activation properties of the Drosophila TRPA1 channel (Fig. 1f). 
Feeding was acutely induced in both sexes (Supplementary Fig. 6). 
An essential component of feeding behaviour not measured by 
proboscis extension is the rhythmic activity of the pharyngeal pump, 
which is used for swallowing food*’*"’. To assess whether the beha- 
viour induced in NP883 > TrpA1 flies included activation of the pha- 
ryngeal pump, we fluorescently labelled the pharyngeal muscles using 
green fluorescent protein expressed under control of the enhancer in 
the myosin heavy chain (Mhc) locus (Mhc-GFP)"* (Fig. 2a, b) so that 
they could be observed through the cuticle. These muscles, m11,m12-1 
and m12-2, (Fig. 2b-d) are attached to the upper sclerotized plate out 
of two sclerotized plates, which lie on top of one another (Fig. 2e and 
Supplementary Fig. 7a). Observing the action of the pump using a 
dye-coloured sugar solution to visualize fluid flow upon presentation 
to a starved fly revealed the dynamics of pump movement shown in 
Fig. 2a-e, Supplementary Fig. 7a—c and Supplementary Video 3. In 
brief, m12-1, m12-2 and m11 sequentially contract and relax in an 
alternating manner to lift first the anterior and then the posterior parts 
of the upper plate to generate rhythmic peristaltic waves of the upper 
plate, which move ingested material from the mouth to the oesophagus 
between the two plates. Counting of individual pump cycles under 
m12-1 and m12-2 (Fig. 2f) revealed that sucrose-induced feeding in 
a starved fly was mediated by vigorous pumping at 6-8 Hz when the 
temperature was set at 29 °C, with the rate declining gradually as the fly 
became satiated, then leading to a fully swollen crop in 2 min (Fig. 2g). 
When we tested NP883 > TrpA1 flies in the Mhc—GFP background we 
found that temperature increase induced intermittent pumping that 
was indistinguishable from natural pumping (Supplementary Video 3). 
Satiated NP883 > TrpA1 flies showed only occasional pumping at 
21°C, but pumped the sugarless dye solution at 6-8 Hz at 29°C ina 
pattern indistinguishable from that of wild-type starved flies with 
sucrose solution (magnified plots in Fig. 2g). Satiated wild-type flies 
used as controls showed much less pumping of the dye solution, even at 
29°C, compared to NP883 > TrpA1 flies at the same temperature 
(Fig. 2g), as total pumping pulses were quantified to show a sevenfold 
difference (Supplementary Fig. 7d). Although the induced total pump- 
ing during 2 min was 40% lower than the sucrose-induced pumping of 
starved wild-type flies (Supplementary Fig. 7d), comparison of rates 
(Fig. 2h) showed that the NP883 > TrpA1 flies maintained pumping at 
the same steady-state rate as starved wild-type flies, although the latter 
initially pumped more vigorously in response to sucrose. By measuring 
the net amount of ingested fluid, we observed that, in the first 2 min, 
induced pharyngeal pumping led to ingestion of 4.7-fold more fluid 
than wild-type controls (Supplementary Fig. 7e), and after approxi- 
mately 5 min led toa fully swollen crop, although the crop was not filled 
at 2 min (Fig. 2g and Supplementary Video 3). Taken together, our 
results demonstrate that the activation of Gal4-expressing cells in 
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Figure 2 | Thermogenetically induced food ingestion through the 
pharyngeal pump. a-e, Mechanism of pharyngeal pump for normal ingestion 
in a wild-type fly. A fly starved for 24h was fed an aqueous 100 mM sucrose 
solution with blue dye. a, A side view. Muscles are highlighted with 
Mhc-GFP. Area in white box is magnified in c. b, Micrographs (left, dissection 
microscope; right, confocal microscope) to illustrate muscle structure. Each of 
three muscle fibre groups, m11, m12-1 and m12-2, consists of several muscle 
fibres, forming the pharyngeal pump. m12-1 was described previously as 12 
(ref. 12), whereas m12-2 was identified in this study with the help of 
Mhc-GFP. Scale bar, 100 jm. ¢, Still images at 21 °C (Supplementary Video 3). 
White arrowheads denote edge of m12-1; black arrowheads denote edge of 
m11. Blue arrows denote flow of ingested material. Asterisks denote the main 
lumen enlarged (also in d and e). d, e, Schematic diagram for muscle 
movement. The two sclerotized plates are depicted as light brown, whereas the 
apodeme, which protrudes from the upper plate and is attached to the muscles, 
is depicted as dark brown. f, Visualization of ingestion with blue dye in the 
space under m12-1 and m12-2 (arrowheads) for counting a single pulse in 

g. Left, closed state without dye; right, open state with dye. The proboscis of the 
fly was allowed to move freely as in its natural state. g, Representative raster 
plots (two for each condition) to show pumping events during a 2-min period at 
21°C and 29 °C. Right, representative flies for each 29 °C groups after 2 min. 
Arrowhead, belly with sugarless dye without starvation. h, Pumping rates 
plotted at every 10s in 29 °C groups in g. n = 11 (wild type, sucrose), 11 
(NP883 > TrpA1), 18 (wild type, control). Error bars denote s.e.m. 
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NP883 produces the complete feeding motor program consisting of all 
essential motor patterns, including pharyngeal pumping. 

To identify the specific neurons within the NP883 expression pattern 
(Fig. 3a and Supplementary Figs 8a, b, 9) that activate feeding behaviour, 
we used the ‘flip-out GAL80’ technique”’, in which the initially ubiquit- 
ous expression of Gal80, an inhibitor of Gal4-mediated transcription, 
is eliminated in small numbers of neurons by the flippase-mediated 
random removal of the GAL80 gene (see Methods). By this means, we 
simultaneously expressed TRPA1 and GFP in small subsets of the 
NP883 pattern and identified the GFP-labelled neurons, whose pres- 
ence we could correlate with TRPA1-induced feeding. From screening 
1,243 flies (Supplementary Fig. 10) we dissected the flies that showed a 
proboscis-extension rate of six times per min or above, a value not 
observed in non-flipped-out specimens (Supplementary Table 1). 
Examination of the 40 proboscis-extension-positive flies led to the 
identification of one type of interneuron, the GFP expression of which 
correlated with flies having a higher proboscis-extension frequency as 
seen in the histogram of Fig. 3b. We termed this pair of interneurons 
Fdg neurons. They possess a distinct and stereotypical morphology 
with extensive arborization (Fig. 3d, Supplementary Figs 10c and 11, 
and Supplementary Video 5), which can be unambiguously identified 
in the full expression pattern of NP883 (arrowheads in Fig. 3a). They 
are located in the suboesophageal ganglion (SEG (also called SOG)), 
where axons of gustatory sensory neurons terminate’*"* (the primary 
gustatory centre in the fly brain) and where motoneurons for mouth- 
part muscles extend their dendritic arborizations’®. To understand 
information flow to and from the Fdg neuron, we defined its cellular 
architecture in relation to these sensory inputs and motor outputs by 
the flip-out GAL80 with synaptic markers (Fig. 3e and Supplementary 
Fig. 12). Figure 3b shows the distribution of proboscis-extension rate in 
the ‘Fdg neuron with GFP group’ to be significantly skewed to high 
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Figure 3 | Identification of the Fdg neuron. a, Full expression pattern of 
NP883 in the SEG as a confocal section, which covers both Fdg neurons 
(arrowheads). b, c, A histogram of proboscis-extension rate in 40 proboscis- 
extension-positive (proboscis extension > five times per min) flies, with GFP 
detected in Fdg neuron (b) or ALLH neuron (c) filled in black. **P < 0.01, 
Mann-Whitney’s U-test between ‘with GFP’ and ‘no GFP’. d, The Fdg neuron, 
labelled with anti-GFP antibody (green). A confocal montage of the SEG (lower 
half) and antennal lobes (upper half), with the neuropil marker antibody, nc82 
(magenta). e, Fdg neuron with the presumptive axon digitally traced in deep 
yellow, on the basis of synaptic marker analyses in Supplementary Fig. 12. 
Arrow indicates position where the axon posteriorly branches off from cell 
body fibre (CBF), then, the sub-branches travel dorsally and ventrally as axon 
terminals (Ax, arrowheads). CB, cell body. All scale bars, 30 um. 
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values in contrast to the ‘no Fdg neuron with GFP’ group by Mann- 
Whitney’s U-test. By contrast, another identified neuron type within 
the NP883 pattern, which we called ALLH (antennal lobe and lateral 
hone) neurons (Supplementary Fig. 13a, b), did not exhibit such a skew 
(Fig. 3c). In addition, none of the other six prominent SEG cell types 
within the NP883 expression pattern showed a statistically significant 
correlation between GFP expression and proboscis-extension frequency 
(Supplementary Fig. 13). This was also the case for all cells outside of 
the SEG (Supplementary Table 2), suggesting that the Fdg neuron is 
responsible for the feeding behaviour with higher proboscis-extension 
rates (see also Supplementary Note 1). 

Supplementary Fig. 10d shows the induced behaviour (Supplemen- 
tary Video 4) of a fly that had strong GFP expression only in a single 
Fdg neuron, as shown in Fig. 3d and Supplementary Fig. 10c. The 
behaviour induced by TRPA1 in this fly clearly included all eight 
motor patterns of the natural feeding program following the initial 
cessation of locomotion, indicating that activation of a single Fdg 
neuron can induce the entire sequence of feeding behaviour. It should 
be noted that the feeding behaviour observed in Fdg-neuron-positive 
flies contrasted with that of NP883 > TrpA1 flies in that it included 
more walking (Supplementary Video 4) and lacked the leg tremors 
observed in NP883 > TrpA1 flies (Supplementary Video 2), probably 
owing to suppression of TRPA1 expression in other cells by Gal80. In 
these respects, the behaviour resulting from the activation of individual 
Fdg neurons more closely mimicked natural feeding behaviour. 
Interestingly, we observed an unusual directionality to the proboscis 
extensions produced by flies in which single Fdg neurons were activated. 
As shown in Supplementary Fig. 10e, Supplementary Video 4 and Sup- 
plementary Table 2, proboscis extension was consistently directed 
towards the side the GFP-expressing Fdg neuron was on. This asym- 
metric regulation of proboscis extension by the Fdg neuron suggests 
that each Fdg neuron may selectively regulate the strength of proboscis 
muscle contraction on the same side of the body, consistent with the 
observation that presentation of food to gustatory receptors on one 
side of the body leads to proboscis extension on that side (Supplemen- 
tary Video 1). 

To determine whether Fdg neuron activity is required for natural 
feeding, we first suppressed activity of all neurons in the NP883-GAL4 
pattern by the expression of an inward rectifier potassium channel, 
Kir’, leading to abolishment of natural feeding behaviour in response 
to sucrose (Supplementary Fig. 14 and Supplementary Video 6). The 
main sugar-sensing neurons of the labellum, which express the gustatory 
receptor GR5A‘*”’, terminate in the vicinity of the dendrite of the Fdg 
neuron, but careful confocal analysis revealed no direct contact between 
the processes of the two types of neuron (Supplementary Fig. 15). To 
determine whether the Fdg neurons receive indirect input from the 
sugar-sensing neurons, we assayed their response to gustatory stimuli 
by calcium imaging using a genetically encoded Ca** indicator, 
GCaMP3.0 (ref. 20), driven by NP883-GAL4. To achieve this, we used 
a specially designed setup (that is, the feeding circuit/fly brain live 
imaging and electrophysiology stage, or FLIES) to visualize SEG neu- 
rons through an opening in the head? (Fig. 4a, b). As shown in Fig. 4c, 
stimulation of the labellar lobes of a starved fly with 400 mM sucrose 
resulted in brief lobe opening, and a simultaneous, large increase in 
GCaMP3.0 fluorescence in the cell body of the Fdg neuron (Sup- 
plementary Video 7). This response was specific insofar as an adjacent 
neuron, which we called LPE (lateral peri-esophageal) (Supplementary 
Figs 13 and 16a), showed no increase in GCaMP3.0 fluorescence, even 
in starved flies (Supplementary Fig. 17a). Furthermore, red fluorescent 
protein (RFP) fluorescence did not change with the sucrose stimulus 
(Fig. 4c and Supplementary Fig. 17a). Interestingly, neither labellar 
opening nor Ca** increase in the Fdg neuron was observed in satiated 
flies (Supplementary Fig. 17). Our results thus indicate that sucrose 
acutely activates the Fdg neurons, and that this response is contingent 
on the metabolic state of the animal. 
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Figure 4 | Functional analyses of Fdg neuron. a, b, Experimental design with 
FLIES chamber’ for experiments in this figure. c, Ca’* imaging of Fdg neuron 
when NP883 > GCaMP3.0; mCD8-rfp flies were stimulated with 400 mM 
sucrose. Top, a representative GCaMP3.0 fluorescence at the cell body of Fdg 
neuron (arrowheads) in a starved fly. Dashed circle denotes quantified area 
outlining an Fdg neuron cell body. Scale bar, 10 tm. Middle, a time course 
of GCaMP3.0/REP fluorescence as ratios to the initial fluorescence in a 
representative example. Bottom, labellar lobe opening (arrowheads) with other 
parts of proboscis immobilized. Arrow, sucrose wick for stimulation. 
Quantification and statistics are given in Supplementary Fig. 17. d, Laser 
activation of a single Fdg neuron in a satiated fly. Left panels, proboscis 
extension to the fly’s right side in response to laser stimulation of the fly’s right 
Fdg neuron cell body to activate TRPA1 under a two-photon microscope. Right 
panels, pump movement induced by laser activation of a single Fdg neuron on 
either side. Dye solution was applied through a capillary tube as shown in 
Fig. 2f, to see ingestion through the pharyngeal pump (white arrow) and the 
oesophagus (arrowhead) immediately after laser illumination. e, Laser ablation 
of a single Fdg neuron. A cell body of a single Fdg neuron was intensely 
illuminated under a two-photon microscope. Blue arrows denote sucrose- 
induced proboscis-extension directions before (left panels) and after (middle 
panels) ablation. Right panels, abolishment of sucrose-induced proboscis 
extension after ablation of both Fdg neurons. Chain lines denote fly’s midline. 
White arrows denote sucrose wick. 
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The FLIES setup allowed us to image and focally heat the cell body of 
a single Fdg neuron expressing TRPA1 and GFP using limited illu- 
mination of the infrared laser of a two-photon microscope for activa- 
tion of the Fdg neuron (see Methods and Supplementary Fig. 16b). As 
expected from the results of our flip-out GAL80 studies, this stimulus 
caused immediate asymmetric proboscis extension to the side of the 
stimulated neuron (Fig. 4d and Supplementary Video 8). It also induced 
pump movement (Fig. 4d and Supplementary Video 8). By contrast, 
selective illumination of an LPE neuron, located only 10-20 tm from 
the Fdg neuron (Supplementary Figs 13 and 16a, c), failed to induce 
proboscis extension or pump movement at the same stimulus level (see 
Methods and Supplementary Video 8). These results directly confirmed 
that activation ofa single Fdg neuron can trigger feeding behaviour in a 
specific manner. 

To test the requirement for the Fdg neuron in natural feeding beha- 
viour, we selectively ablated Fdg neurons in starved NP883 > gfp flies 
using stronger laser illumination (see Methods and Supplementary 
Fig. 16d). Ablation of the Fdg neuron on one side, followed by stimu- 
lation with 400 mM sucrose, triggered proboscis extension in the 
direction opposite to the ablated side, whereas ablation of the Fdg 
neurons on both sides completely eliminated the response to sucrose 
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(Fig. 4e and Supplementary Video 9). In control experiments, ablation 
of the nearby LPE neuron did not affect the proboscis-extension response, 
again demonstrating the specificity of the manipulation. Consistent 
with the results of Kir suppression (Supplementary Fig. 14), these results 
demonstrate that Fdg neurons are essential for natural feeding in the fly 
and demonstrate the absence of neurons with redundant function. 

The induction of the entire feeding program by Fdg neuron activa- 
tion contrasts with the effects of activating motor neurons that innerv- 
ate muscles of the proboscis” or the pharyngeal pump”. The induction 
of feeding by Fdg neurons is likewise distinct from that produced by 
stimulation of neurons that co-express neuropeptide Y and agouti- 
related protein in the mammalian hypothalamus, which has long 
latencies (that is, minutes versus seconds) and involves indirect regu- 
lation of motor output’. Their activity encodes metabolically derived 
motivational cues and contrasts with that of the Fdg neurons, which 
clearly encode integrated information of both gustatory and metabolic 
origin, and drive motor output in a manner that is perhaps most reminis- 
cent of the “command neurons’ (interneurons whose natural activity 
triggers a specific motor program) first described in the crayfish**. The 
motor—as opposed to motivational—function of the Fdg neurons is 
evident in their asymmetric control of proboscis extension, which 
indicates a specific role of each Fdg neuron in contraction of a subset 
of the proboscis musculature. How the Fdg neurons coordinate the 
various motor patterns involved in feeding remains to be determined. 
Pump rhythms (Fig. 2), like the well-characterized movements of the 
crustacean stomatogastric nervous system, may result from the action 
of a central pattern generator governed by intrinsic membrane prop- 
erties and inhibitory interactions of the component neurons”’. Recently, 
co-activation of motoneurons controlling m11 and m12-1 has been 
shown to generate rhythmic contractions of the pharyngeal pump”, 
and activation of these neurons by the Fdg neurons might be the source 
of the pump central pattern generator’. As seen in Supplementary 
Video 5, the large dendritic arborization of the Fdg neuron, which is 
reminiscent of the putative feeding neurons of toads'” and courtship 
neurons of Drosophila’®, suggest a role in integrating information 
beyond sugar and starvation cues, including perhaps other gustatory 
cues, such as bitter or salty, and signals of other modalities. In any case, 
our laser ablation experiments suggest that inputs that govern feeding 
responses probably pass through the single pair of Fdg neurons. The 
identification of these neurons here and the demonstration of their 
pivotal position in the feeding circuit open the door to systematic 
future experiments on their roles in sensory integration and its plas- 
ticity in fly feeding behaviour. 


METHODS SUMMARY 


In behavioural observation, the temperature was maintained within + 1 °C ofa set 
temperature. Immunohistochemistry was performed according to a protocol 
described previously” with a modification for adult brains. Ca** imaging as well 
as laser activation and inactivation were performed using FLIES apparatus, which 
was designed to expose the brain of a fly for general purposes such as live imaging, 
electrophysiology and to keep the fly’s proboscis dry and free for movement”. A 
sugar-free saline (1.5 mM Ca**) used previously for Drosophila electrophysiol- 
ogy was continuously perfused at 21°C. The head capsule was opened by a 
tungsten ‘sword’, which was originally designed for dissection of a Drosophila 
embryo used in study of synaptic plasticity”’, and by ‘scisceps’, which are forceps 
modified to act as scissors’. An ultrathin, smooth, traditional Japanese Washi 
paper, Gampi-shi (Haibara), was used as a wick for sugar stimulus’. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Immunohistochemistry. We performed immunolabelling according toa protocol 
described previously” with a modification for adult brains (see Supplementary 
Methods for details). 

Saline and dissection tools. A sugar-free saline used previously for Drosophila 
embryonic electrophysiology was also used in this study”*. The saline contained (in 
mM): NaCl, 140; KCl, 2; MgCl), 4.5; CaCl, 1.5; and HEPES-NaOH, 5 (pH 7.1). 
The head capsule was opened by a tungsten ‘sword’, which was originally designed 
for dissection of a Drosophila embryo used in study of synaptic plasticity”, and by 
‘scisceps’, which are forceps modified to act as scissors”. 

Fly strains. Drosophila crosses were performed at 21°C or 25°C according to 
standard protocols. Canton S was used as the wild-type control. Transgenic strains 
were balanced with FM7c, CyO, TM3 or TM6B chromosomes. UAS-TRPM8 has 
been previously described*®. UAS-TrpA1 (ref. 9) was obtained from P. Garrity. 
UAS-mCD8-mCherry was made by A. Sheehan, and generously provided by M. 
Freeman before publication. Mhc-gfp was from K. Gajewski'*, UAS-mCD8-gfp”° 
and a heat-shock-flippase (HS-FLP) strain'® were from T. Lee. >Tubulin- 
GAL80> (in which ‘>’ denotes a flip recombination target, FRT, sequence to 
be recognized by flippase; the “Tubulin-GAL80 sequence between the two FRT 
sequences is excised when flippase is provided; and Tubulin is used for ubiquitous 
expression of Gal80), made by G. Struhl, was from M. Roshbash*', elav-GAL80 
(ref. 32) was from Y.-N. Jan, Mhc-GAL80 (ref. 33) was from L. Luo, Cha-GAL80 
(ref. 34) was from S. Waddell, UAS-brp-gfp*? and UAS-nAChR-gfp (UAS-Da7- 
ofp)” were from S. Sigrist, UAS-n-syb-gfp”’ was from M. Ramaswami, Gr5a-gfp- 
IRES-gfp-IRES-gfp** was from K. Scott, UAS-Kir2.1 (ref. 39) was from V. Budnik, 
tubP-GAL80* (temperature sensitive)’? was from S.Waddell, UAS-GCaMP3.0 
(ref. 20) was from L. Looger and UAS-mCD8-rfp was from T. Awasaki*’. 

We used two GAL4 strains that were established by the NP consortium®. The 

NP883 line has a GAL4 insertion approximately 500 base pairs 5’ upstream of the 
untranslated region of Cyp6al4 (ref. 42), a locus encoding a member of the 
cytochrome P450 family, which functions for electron transfer. Although none 
of the GAL4 lines screened showed temperature-induced behaviour similar to 
NP883, another NP line not included in the screen, NP5137, was later identified 
as having an insertion at a more proximal site to the coding region of Cyp6al4 
(ref. 42). This line exhibited a similar pattern of feeding behaviour (Fig. le, f) when 
driving TRPA1, and included Fdg neuron in its expression pattern common to 
NP883 (Supplementary Fig. 8c). 
Observation of fly behaviour. For observing TRPA1-induced behaviour, we used 
a custom-built plastic chamber (Supplementary Fig. 2), which enabled the tem- 
perature gradient to be maintained within + 1 °C from its floor to ceiling (height 
4mm) at experimental temperatures. The chamber was designed to fit snuggly 
into a Nunc 35-mm plastic dish, and temperature was regulated by a TS-4 SPD 
Controller (Physitemp) and monitored with an IT-23 probe connected to a 
microprobe thermometer, BAT-10 (Physitemp) (Supplementary Fig. 2). The 
inside of the fly chamber was cleaned after each use. We observed fly behaviour 
by the usual techniques (see Supplementary Methods). 

For observing labellar movement with immobilized proboscis (Supplementary 
Fig. 5), we anaesthetized a fly in a 15-ml plastic tube immersed in ice, and placed 
the fly in a Pipetman tip with its tip cut to expose its head**. The rostrum and 
haustellum of the fly’s proboscis were fixed using light-curing glue (Tetric 
EvoFlow, Ivoclar Vivadent). The fly held in the Pipetman tip was videotaped for 
1 min at 21°C. Then, the Pipetman tip holding the fly was placed in the temper- 
ature-controlled chamber pre-warmed to 31-32 °C. One minute later, when the 
temperature monitored by the temperature probe reached 31-32 °C, we video- 
taped the labellar lobes for 1 min. After that, we took the Pipetman tip holding the 
fly out from the chamber and placed it in the room at 21 °C, and we videotaped 
labellar lobes for 1 min. 

Visualization and quantification of pump movement and quantification of 
dye ingestion amount. For visualization of pump movement, a fly with Mhc-gfp"* 
was constrained in a Pipetman tip as stated above. For natural feeding, an aqueous 
100 mM sucrose solution with 0.03 mg ml“! Brilliant Blue FCF (Acros Organics) 
was provided to a 24-h-starved fly through a hypodermic needle. For visualizing 
m11, the hypodermic needle was placed anterior to make the rostrum fully pro- 
tracted. For induced feeding, the rostrum ofa fly with NP883 > TrpA1 with Mhc- 
gfp constrained in a Pipetman tip was glued to be immobilized at fully protracted 
position with light curing glue. Then, the Pipetman tip was placed in the temper- 
ature-controlled chamber with small holes, through which a hypodermic needle 
passed to provide the blue dye solution. The hypodermic needle was loaded on a 
joystick manipulator (Narishige, MN-151) and connected with a flexible plastic 
tube to an injector (Narishige, IM-5B) to constantly supply dye solution. Fluore- 
scence from Mhc-GFP was observed using Leica MZ10F with a charge-coupled 
device (CCD) camera for videotaping. This fluorescence was supplemented with 
additional light from fibre optics to visualize the fly’s mouthpart structures. We 


noticed that, in partially satiated wild-type flies and sometimes in NP883 > TrpA1 
flies, contractions of m12-1 and m12-2 were not necessarily associated with those 
of m11 (Supplementary Video 3), causing backward flow from the spherical 
lumen” (Supplementary Video 3). To quantify ingestion directly, we therefore 
measured the net amount of ingested fluid as follows. 

For quantification of pump movement and ingestion amount, a tethered fly on 

its back was provided with the dye solution through a glass capillary tube, which 
was loaded on a manipulator and connected to an injector as stated above. The 
glass capillary allowed us to measure the amount of solution ingested by measuring 
the length of dye solution filling the capillary. In the same experiments, the fly was 
videotaped with a CCD camera through a Nikon SMZ-800, and dye movement 
under m12-1 and m12-2 (Fig. 2f) was characterized. 
Flipping screening for feeding flies. We used flies with the following genotype 
for flipping experiments for TRPA1 and GFP: HS-FLP (X chromosome); > Tubulin- 
GAL80 > UAS-TrpA1/NP883 (second chromosomes); UAS-mCD8-gfp/+ (third 
chromosomes). We used flies with the following genotype for flipping experiments 
for TRPA1, mCherry and BRP-GFP: HS-FLP (X-chromosome); > Tubulin-GAL80 > 
UAS-TrpA1/NP883 (second chromosomes); UAS-mCD8-mCherry UAS-brp-gfp/+ 
(third chromosomes). A series of similar experiments was also performed using 
UAS-n-syb-gfp’’ instead of UAS-brp-gfp. We used flies with the following geno- 
type for flipping experiments for TRPA1, mCherry and nAChR-GFP: HS-FLP (X 
chromosome); > Tubulin-GAL80> UAS-TrpA1/NP883 (second chromosomes); 
UAS-mCD8-mCherry UAS-nAChR-gfp/+ (third chromosomes). We used flies 
with the following genotype for flipping experiments for TRPA1, mCherry and 
GR5A-GFP: Gr5a-gfp-IRES-gfp-IRES-gfp/HS-FLP (X chromosome); >Tubulin- 
GAL80> UAS-TrpA1/NP883 (second chromosomes); UAS-mCD8-mCherry/+ 
(third chromosomes). 

These flies were aged 2-5 days after eclosion and tested at 37°C to observe 
feeding behaviour. Behaviour was quantified by counting proboscis extensions for 
1 min after an incubation period of 30s upon introduction into the behavioural 
chamber. Heat shock was not necessary as flipping was active at normal temper- 
ature (21°C). For assessing asymmetry, we analysed movie frames with bottom 
views, and judged asymmetry if the centre of labella extended beyond 5% of the 
distance between the midline and lateral edge of the fly’s head at least three times. 

For describing Fdg neuron branching pattern, more than ten samples with an 

isolated and complete Fdg neuron were analysed in detail from the behavioural 
screening described in the text. For each analysis of cellular architecture, at least 
four good samples for each genotype were analysed in detail by isolating, immuno- 
labelling and microscoping more than ten feeding-positive flies from each series of 
behavioural screening. 
Suppression by Kir2.1 channel. The inward rectifier potassium channel’? 
Kir2.1 (ref. 39) was expressed exclusively in the adult stage using the TARGET 
system” to avoid any developmental effects. Flies with NP883, UAS-Kir2. 1, tubP- 
GAL80* were reared at 19°C, and collected within 1 day after eclosion. NP883, 
UAS-Kir2.1 flies were lethal, thus, the suppression by Gal80"° through develop- 
ment was mandatory. The flies were starved and temperature-shifted in the pro- 
tocols depicted in Supplementary Fig. 14. They were starved in a vial with a wet 
paper towel at the bottom. Starved flies were anaesthetized by chilling in a test tube 
standing on ice, and gently held by Pipetman tip as previously described*’. For 
observation of proboscis extension response (PER), the proboscis was stimulated 
by a 100-mM aqueous sucrose solution on a wick inserted into a 1-ml syringe®. A 
special ultrathin, smooth, traditional Japanese Washi paper, Gampi-shi (Haibara), 
was used as a wick. This was sturdy and held solution well and was transparent 
when wet, all improvements for reducing experimental variation when compared 
to KimWipes”. After making a very small droplet of sugar solution at the wick, the 
piston of the syringe was pulled, and at the moment when the droplet was sucked, 
the wet surface of the Washi wick was applied to the tip of proboscis. These 
manipulations were done quickly to prevent the animal from drinking sucrose 
solution and mitigating its starved state (Supplementary Video 6). Before sucrose 
application water was applied to make sure the fly was not thirsty. If the fly 
responded just to water, it was given water to the point of satiation. Each fly 
was given five presentations of sucrose solution and PER was counted. Between 
each sucrose presentation water was applied to clean labellar lobes. As no differ- 
ence between males and female was recognized, half of the results were from male 
flies and half were from females. 

For observation of free-running behaviour with Kir2.1, flies of certain genotype 
were placed individually in a chamber (3.5mm X 10mm, 2mm height) with a 
sheet (1-mm thickness) of standard fly food on one side (10mm Xx 2mm). 
Behaviour was observed for 2min and videotaped in the 25mm X 10mm 
(2mm height) space through the top glass (3.5 mm X 10mm) at each time point 
for each genotype. Proboscis extension that reached the food was counted for 30s, 
1-1.5 min after placing the fly into the chamber, and proboscis extension per 
minute was calculated. 
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Calcium imaging with observation of proboscis extension response. Ca”~ 
imaging as wellas laser activation and ablation were performed using FLIES, which 
was designed to expose the brain of a fly for general purposes such as live imaging, 
electrophysiology and to keep the fly’s proboscis dry and free for movement®. Ca”~ 
imaging was performed by a method modified from that previously reported’. An 
adult fly was anesthetized in a 15-ml plastic tube standing on ice and set in a tube 
attached to a FLIES apparatus. Light-curing glue was used to seal the proximally 
adjacent part of the rostrum to the inner edge of the chamber’s hole. To minimize 
movement artefact, we immobilized the proboscis, which we kept half extended to 
prevent the pump unit from bumping into and from occluding the SEG, with light- 
curing glue leaving only labellar lobes free to move. In the saline described above, 
the head capsule was opened by the tungsten ‘sword’, and by the ‘scisceps’ to better 
clip the cuticle and trachea and expose the SEG. The oesophagus, muscle 16 (ref. 12) 
and the antennal nerves were removed, and air sacks were stretched to the side to 
expose an Fdg neuron’s cell body and to avoid movements that could add noise to 
the Ca** signal. Ca** imaging was performed following a previous report’. We 
scanned the cell body of an Fdg neuron through a 40% water immersion lens 
(0.80 numerical aperture), using the spinning disk confocal laser system, CSU X1 
(Improvision/Yokogawa) using Volocity software, v4.3, on a BX51 WI microscope 
(Olympus). mCD8-RFP was co-expressed to check movement artefact, and 
GCaMP3.0 and mCD8-REFP were labelled at the same time. GCaMP3.0 signal 
was imaged with an exposure time of 300 ms of 491-nm laser for detection, and 
mCD8-RFP fluorescence was imaged with a 535-nm laser with an exposure time 
of 100 ms every 1.4s. GCaMP3.0 fluorescence and mCD8-RFP fluorescence at the 
cell body of a Fdg neuron were quantified at a region of interest using the Volocity 
software (Improvision). Identification of a Fdg neuron by its location was con- 
firmed by immunolabelling with anti-GFP antibody recognizing GCaMP 3.0 after 
Ca** imaging experiments. Throughout the experiments, saline was slowly (one 
drop per second) perfused. Perfusion dramatically reduced the spontaneous move- 
ment of a proboscis, which is one key source of movement artefact. The proboscis 
was stimulated by an aqueous sucrose solution in the same manner as PER experi- 
ments with Kir suppression. Labellar bristles (Supplementary Fig. 3a) sensed the 
sucrose and the proboscis extended reproducibly if flies were starved for 24h 
immediately before PER experiments (Supplementary Video 7). PER behaviour 
was monitored and recorded through a CCD camera attached to a dissection 
microscope (SMZ-800, Nikon) supported by a swing arm at the same time as 
GCaMP3.0 was being imaged by the spinning disk confocal microscope’. In the 
case of NP883 > GCaMP3.0 flies, starvation effect is accelerated probably due to 
interaction between GCaMP3.0 and cellular Ca?* ions, and these flies show a full 
PER in response to 100 mM sucrose by only 13h of starvation. Starvation for too 
long, such as for 24h, decreases the probability of PER. Therefore, we started 
dissection in this series of Ca”* imaging experiments at around 13h of starvation. 
To gain an unambiguous response, we stimulated with 400 mM sucrose because 
weak responses by 100 mM sucrose tended to be confused with background activ- 
ity or movement artefact even after several attempts to reduce movement artefact 
as stated above. We checked PER by 100 mM sucrose before dissection, and only 
flies exhibiting PER behaviour were dissected. For satiated experiments, flies were 
placed in a grape juice/yeast pasted food vial for more than 1h, and only flies that 
did not show a PER to sucrose stimulation were dissected. We took data within 
1.5h after starting dissection. Details of these methods were published’, but many 
details were improved from the published methods to minimize movement 
artefact to unambiguously detect small responses in Fdg neurons. 

Laser activation and laser ablation of an Fdg neuron. The FLIES apparatus was 
used, and experiments were performed under the Zeiss two-photon microscope, 
LSM7 MP. The fly was set into the FLIES apparatus stated above, but the proboscis 
was left free for observation of its movement, especially for testing asymmetry of 
proboscis extension. Dissection was done in the same manner as in the Ca”* 
imaging experiments. We used the same saline as that used for Ca** imaging in 
this series of laser activation and ablation experiments. 

For laser activation, we first briefly imaged a satiated NP883 > TrpA1; mCD8-gfp 
fly and identified an Fdg neuron and an LPE neuron (control), limiting infrared 
illumination as much as possible to avoid triggering activity (Supplementary Fig. 16). 
Then we set a 15.4-jtm (55 pixel at 3X zoomed condition; pixel size, 0.28 [1m) 
diameter region of interest surrounding the cell body (Fdg neuron or LPE neuron). 
We set the circle so that its diameter was twice the diameter of the cell, so as not to 
miss the cell body even after small movements, which were inevitable because the 
proboscis was moving freely. Using the ‘test bleaching’ program of the LM 7 MP 
system’s Zen software, we scanned the area of the circle for 120 ms total (four 
iterations) with 20% power at a laser setting of 870 nm. We set the scan speed at 
5, which corresponded to a pixel dwell time of 12.61 Us, and selected the ‘zoom 
bleach’ function on the software. This scanning protocol was repeated three times 
approximately 10s apart. We could see an obvious proboscis extension, usually by 
the third scanning but sometimes earlier, presumably by a facilitative effect. Only one 
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neuron was illuminated for each preparation either for Fdg neuron or LPE neuron to 
exclude the possibility of activating the cell as a result of repeated laser scanning. At 
the end of the experiment, a live scan of the cell body was taken, and no obvious 
damage was observed (Supplementary Fig. 16b, c). We positioned a Nikon SMZ-800 
stereomicroscope supported with a swing arm in front of the fly to observe and 
videotape movement of the proboscis, using fibre illumination. To suppress back- 
ground TRPAI opening and to improve the spatiotemporal resolution of laser 
activation, we perfused saline chilled to 21 °C during the entire experiment. 

For laser activation observing pump movement, a similar experimental setup 
was used as described above, except the oesophagus was left intact to allow proper 
ingestion of dye aqueous solution. The rostrum of the proboscis was extended out 
with gentle vacuum and glued to the eaves of the FLIES apparatus with light curing 
glue to visualize the pharyngeal pump. Before activation, flies were allowed access 
to ingest dye solution to avoid dehydration. Numbers of samples were: 4 (Fdg 
neuron for proboscis extension), 5 (Fdg neuron for pumping), 5 (LPE neuron for 
proboscis extension), 5 (LPE neuron for pumping). All preparation for the Fdg 
neuron showed positive effects; that is, proboscis extension in the direction to the 
same side as illuminated Fdg neuron, or a pump pulse just after laser illumination 
on either side of Fdg neuron. Control illumination on LPE neuron had no obvious 
effect on proboscis extension or pump pulses. In the absence of TRPA1 expression, 
we saw no induction of feeding behaviour upon infrared illumination of the Fdg 
neuron, also confirming the specificity of the manipulation (data not shown, n = 5). 

For laser ablation, we used basically the same type of experimental setup as that 
for laser activation. We first briefly imaged a starved NP883 > mCD8-gfp fly and 
identified an Fdg neuron and an LPE neuron. Then we set 2.8-tm (10 pixel at 3X 
zoomed condition; pixel size, 0.28 jum) diameter circle inside the targeted cell body 
(Fdg neuron or LPE neuron). We set the scan speed at 4, which corresponded to a 
pixel dwell time of 25.21 ts, and selected the ‘zoom bleach’ function within the 
software. Using the ‘test bleaching’ program, we scanned the area of the circle for 
10 ms as total (five iterations) with 30% power at 870 nm. The strong laser made a 
damaged-looking cell body (arrowhead in Supplementary Fig. 16d) to confirm 
that the cell was ablated. In some cases, we could observe a small, transient bubble, 
which shrank and disappeared in a few seconds, then ended up with the aforemen- 
tioned damaged look. Before and after ablation, we tested PER with 400 mM sucrose 
stimulation. We perfused saline chilled to 21°C during all experiments. After 
ablation of a neuron, we waited for 15 min until ablation effect appeared on PER. 
Numbers of samples were: 5 (Fdg neuron), 5 (LPE neuron). All Fdg neuron ablation 
gave consistent results with those in Fig. 4e and Supplementary Video 9, whereas 
ablation of LPE neuron showed no recognizable effect on proboscis extension. 

For assessing asymmetry both in laser activation and in laser ablation, we 
analysed movie frames, and judged asymmetry if the midline of the labella 
extended beyond 5% of the distance between the midline and the lateral edge of 
the fly’s head. 

Statistics. All statistical analyses were performed according to standard methods*® 
using Prism, v5.0a (GraphPad Software) and Excel (Microsoft). 

For statistics in Fig. le, the six groups were analysed with the Kruskal-Wallis 
test using a one-way analysis of variance by ranks, and the significant difference 
between groups was found (P< 0.0001). *** denotes P< 0.001 by Dunn’s post- 
hoc multiple comparison test between progeny from this cross: NP883 X UAS- 
TrpAl (NP883 > TrpA1), compared to these crosses NP883 X wild type, wild 
type X UAS-TrpA1, or wild type X wild type. The same post-hoc analysis was 
performed for the NP5137 line. 
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High-resolution analysis with novel cell-surface 
markers identifies routes to iPS cells 


James O'Malley’, Stavroula Skylaki?, Kumiko A. Iwabuchi!, Eleni Chantzoura!, Tyson Ruetz', Anna Johnsson’, 


Simon R. Tomlinson’, Sten Linnarsson*® & Keisuke Kaji! 


The generation of induced pluripotent stem (iPS) cells presents a 
challenge to normal developmental processes. The low efficiency 
and heterogeneity of most methods have hindered understanding 
of the precise molecular mechanisms promoting, and roadblocks 
preventing, efficient reprogramming. Although several intermedi- 
ate populations have been described’”’, it has proved difficult to 
characterize the rare, asynchronous transition from these inter- 
mediate stages to iPS cells. The rapid expansion of minor repro- 
grammed cells in the heterogeneous population can also obscure 
investigation of relevant transition processes. Understanding the 
biological mechanisms essential for successful iPS cell generation 
requires both accurate capture of cells undergoing the reprogram- 
ming process and identification of the associated global gene 
expression changes. Here we demonstrate that in mouse embryonic 
fibroblasts, reprogramming follows an orderly sequence of stage 
transitions, marked by changes in the cell-surface markers CD44 
and ICAMI1, and a Nanog-enhanced green fluorescent protein 
(Nanog-eGFP) reporter. RNA-sequencing analysis of these popu- 
lations demonstrates two waves of pluripotency gene upregulation, 
and unexpectedly, transient upregulation of several epidermis- 
related genes, demonstrating that reprogramming is not simply 
the reversal of the normal developmental processes. This novel 
high-resolution analysis enables the construction ofa detailed repro- 
gramming route map, and the improved understanding of the repro- 
gramming process will lead to new reprogramming strategies. 
Several reports have suggested that reprogramming progresses in an 
ordered manner**°*"’. To identify markers whose expression chan- 
ged concurrent with pluripotency gene expression, we performed time 
course microarray analysis using a piggyBac transposon-based secon- 
dary reprogramming system*”’ (Supplementary Fig. 2a). Of a number 
of candidate cell-surface markers, Cd44 and Icam1 (also known as 
CD54) demonstrated the most dynamic expression changes through- 
out secondary mouse embryonic fibroblast (MEF) reprogramming 
(Supplementary Fig. 2b). For further investigation, we generated an 
efficient secondary reprogramming system in which doxycycline- 
mediated induction of the reprogramming factors could be monitored 
by an mOrange reporter placed after the 2A-peptide-linked repro- 
gramming cassette c-Myc-KIf4-Oct4-Sox2 (MKOS)”, and. endogen- 
ous Nanog promoter activation could be followed by expression of 
enhanced green fluorescent protein (eGFP)'* (Supplementary Fig 3). 
Reprogramming cultures were supplemented with vitamin C and an 
Alk inhibitor, both of which enhance reprogramming efficiency'®'*"*. 
In this secondary reprogramming system, Nanog-eGFP * cells appeared 
as early as day6, and >60% of mOrange* transgene-expressing cells 
were found to be Nanog-eGFP* by day 12 (Supplementary Figs 4 
and 5a). Most mOrange” transgene-expressing cells lost expression of 
Thy1 (also known as CD90) and gained E-cadherin (also known as 
Cdh1) expression by day 4 (Supplementary Fig. 5b, c). Expression of 
stage-specific embryonic antigen 1 (SSEA-1, also known as Fut4) barely 


changed after day 8, with a gradual gain of Nanog-eGFP” cells in both 
SSEA-1* and SSEA-1~ cell populations (Supplementary Fig. 5d). 
Consistent with heterogeneous expression of SSEA1 in iPS and embry- 
onic stem (ES) cells, it was not possible to delineate the reprogramming 
process accurately using SSEA- 1 (Supplementary Fig. 6). By contrast, the 
appearance of CD44~ and ICAM1" cells at later time points closely 
correlated with Nanog-eGFP expression (Supplementary Fig. 5e, f). 
Double staining for CD44 and ICAM1 revealed that a distinct series 
of population changes occur during reprogramming (Fig. 1). Initially, 
MEFs displayed high CD44 and broad ICAM1 expression, with most 
becoming ICAM1 by day6, along with the appearance of a minor 
CD44" ICAM1-_ cell population. By day8, CD44 populations 
appeared enriched, and at day 12 almost all cells displayed an iPS/ES- 
cell-like CD44~ ICAM1" profile, of which more than 60% expressed 
Nanog-eGFP. Consistent with the observation that Nanog expression is 
not necessarily a sign of completed reprogramming'’, Nanog-eGFP* 
cells were observed even before cells obtained this iPS/ES-cell-like 
phenotype (CD44 ~ ICAM1*). Both ICAM1*- and ICAM1 -sorted 
MEFs demonstrated similar fluorescence-activated cell sorting (FACS) 
profile changes during reprogramming (Supplementary Fig. 7). 
Immunofluorescence for CD44 and ICAM1 revealed that reprogram- 
ming is not synchronized even within individual colonies (Supplemen- 
tary Fig. 8). Secondary reprogramming of the non-polycistronic iPS 
cell line 6c (refs 3, 11) and primary reprogramming using MKOS and 
Oct4-P2A-Sox2-T2A-KIf4-E2A-cMyc (OSKM)" piggyBac transposons 
resulted in similar ICAM1 and CD44 profile changes, indicating their 
suitability for use in other systems and contexts (Supplementary Fig. 9). 
These findings demonstrated the asynchronous but stepwise manner of 
reprogramming, and highlighted the potential usefulness of CD44 and 
ICAMLI to isolate intermediate reprogramming subpopulations. 

Next, we aimed to confirm that the observed CD44/ICAM1 profile 
changes reflected the transition of individual cells from one stage to the 
next, and not merely the loss of one major population and expansion 
of another minor population. CD44* ICAM1~ (gate 1), CD44~ 
ICAM1 (gate2)andCD44" ICAM1 i (gate 3) cell populations, either 
Nanog-eGEP* (that is, ING‘, 2 NG* and 3NG*) or Nanog-eGFP ~ 
(ING ,2NG" and 3NG ), were isolated by cell-sorting at day 10 of 
reprogramming and re-plated in reprogramming conditions (Fig. 2a). 
After 3 days, both NG* and NG’ cells progressed in the order of gates 
1 to 2 to 3 (Fig. 2b). This progression correlated well with increased 
Nanog-eGFP” colony-forming potential (c.f.p.), with 3NG” cells dis- 
playing similar clonogenicity to fully reprogrammed iPS cells (Fig. 2c). 
Of cells with the same CD44/ICAM1 profile, Nanog—-eGFP expression 
correlated with a higher c.f.p. (for example, ING” versus ING”). 

To examine the progression of the reprogramming process more accur- 
ately, cells from each gate were sorted, and their expression of CD44/ 
ICAM1/Nanog-eGFP was re-analysed after 24h (Fig. 2d). On the basis 
of total cell numbers in each gate after 24h (Supplementary Fig. 10), we 
generated a reprogramming route map representing differences in the 
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Figure 1 | FACS analysis during secondary reprogramming of MEFs with 
CD44/ICAM1 double staining. Loss of CD44 expression was rapidly followed 
by ICAM1 upregulation and Nanog-eGFP expression. By day 12, most cells 


efficiency of these stage transitions and in Nanog-eGFP* c.fp. (Fig. 2e). 
Similar results were obtained when each subpopulation was sorted at 
day 8 (Supplementary Fig. 11). This analysis revealed that reaching a 
Nanog-eGFEP™ state is a rate-limiting step—as few cells overcame this 
barrier in the 24h assay—and those that do so reprogram more efficiently 
than their Nanog-eGFP counterparts, consistent with the role of Nanog 
as an accelerator of reprogramming and the gateway to pluripotency’*”. 

To determine global gene expression changes during these stage 
transitions, we carried out RNA-sequencing analysis using a highly 


Day 10 b 1NG* 


ICAM1 


Post sort 
mor +dox day 10 


CD44 
Flow cytometry 


Nanog-eGFP- 


y Nanog-eGFP* 


Post sort + 3 days 
reprogramming 


LETTER 


displayed an ICAM*/CD44~ ES-cell-like profile. Red denotes Nanog-eGFP 
cells; green denotes Nanog-eGFP ™ cells. 


multiplexed sample bar-coding system*”*® (see Methods and Sup- 
plementary Table 1). Hierarchical clustering using the complete list 
of differentially expressed genes (DEGs) revealed four major branches: 
(1) MEFs; (2) ING’ /* and 2NG ; (3) 2NG/* and 3NG/*; and (4) 
3NG* sorted at day 15 (3NG*D15), iPS and ES cells (Fig. 3a). There 
was a prominent gene expression difference between 3NG* and 
3NG‘D15 cells, with the latter being more similar to iPS and ES cells 
(Fig. 3a and Supplementary Fig. 12), possibly reflecting the observed 
difference in the c.f.p. in the absence of doxycycline (Supplementary 
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Figure 2 | CD44/ICAM1 subpopulations represent distinct stages of 
reprogramming. a, Nanog-eGFP* (NG*) and Nanog-eGFP (NG _) cells 
were subdivided into CD44* ICAM1~ (gate 1),CD44° ICAM1 (gate 2) and 
CD44” ICAM1" (gate 3) populations at day 10 of reprogramming. b, FACS 
analysis of sorted subpopulations after a 3-day culture in the presence of 
doxycycline (dox). ¢, Relative probability to generate Nanog-eGFP~ iPS cell 


colonies from each subpopulation compared to fully reprogrammed iPS cells. 
Error bars represent s.d., n = 3. d, Expression of CD44, ICAM1 and Nanog- 

eGFP was re-analysed 24h after sorting. e, Major transitions (>500 cells) of 

each population within 24h. The y axis indicates relative c.f.p. after a further 

10 days. Arrow size reflects relative cell numbers. 
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Figure 3 | Global gene expression changes during the stage transition. 

a, Hierarchical clustering of samples with DEGs and expression heat map. 
Groups A-E represent different expression patterns. b, Early (left) and late 
(right) upregulation of pluripotency-related genes. Black and red asterisks 
indicate early and late pluripotency genes, respectively, previously identified by 


Fig. 13). The DEGs between these two populations may be involved in 
the establishment of an exogenous-factor-independent self-renewal 
state. Principal component analysis clearly distinguished 2NG* from 
3NG cells, consistent with the higher probability of the former to 
reach the 3NG" state within 24h (Supplementary Figs 10 and 12b). 
DEGs could be classified into five distinct expression pattern groups 
(A-E) (Fig. 3a and Supplementary Tables 2 and 3). Group A contained 
readily downregulated fibroblast-related genes. Group D comprised 
factors gradually upregulated towards iPS cells, in which ES cell genes 
were highly enriched (P= 0.000367) (Fig. 3c). However group C, 
which contained genes upregulated at early stages and maintained 
throughout reprogramming, also included some pluripotency-related 
factors. To extend this finding, we examined the expression pattern of 
22 pluripotency-related genes in our data set””**. Interestingly, 8 pluri- 
potency genes, including endogenous Oct4 (also known as Pou5f1), 
were already upregulated at the ING*/2NG_ stages to the level found 
in 3NG" cells (Fig. 3b, left), whereas 14 pluripotency genes were more 
gradually upregulated in the later stage reprogramming populations 
(Fig. 3b, right, and Supplementary Table 4). This early and late pluripo- 
tency gene upregulation was confirmed at the single cell level* (Fig. 3e), 
highlighting the high resolution of the CD44/ICAM1 sorting system. 
We also identified two additional gene expression patterns display- 
ing transient upregulation (group B) or downregulation (group E) 
exclusively in the intermediate stages of reprogramming. This finding 
indicates that reprogramming from MEFs to iPS cells is not simply the 
loss of MEF genes and gain of ES cell genes. Gene Ontology analysis 
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single-cell quantitative PCR (qPCR)’. c, Epidermal and stem-cell gene 
enrichment in gene list B and D, respectively. d, Transient upregulation of 18 
epidermis/keratinocyte-related genes during reprogramming. e, Single-cell 
gene expression analysis. Each square represents one reaction chamber from 
one cell. Colour corresponds to AC, value, as shown in the legend. 


revealed that genes related to ectoderm/epidermis development and 
keratinocyte differentiation were highly enriched in group B 
(P = 0.000274) (Fig. 3c, d and Supplementary Tables 3-5). Although 
SFN and KRT17 were barely detectable by immunofluorescence in 
MEFs and iPS cells, transient upregulation was observed in the inter- 
mediate stages of reprogramming (Supplementary Fig. 14). Single-cell 
PCR confirmed the co-expression of epidermis genes (Ehf and 
Ovoll1) with early pluripotency genes in the ING ’* stage (Fig. 3e). 
Consistent with our data, analysis of three published microarray data 
sets incorporating partially reprogrammed iPS cells', a time course 
experiment® and a subpopulation analysis with Thyl, SSEA-1 and 
Oct4-eGFP (ref. 6) confirmed transient epidermal gene expression 
during reprogramming (Supplementary Figs 15-17 and Sup- 
plementary Tables 6-8). Partially reprogrammed cells from B cells also 
displayed similar epidermis gene expression*, whereas two factor- 
reprogramming (Oct4 and Sox2) of MEFs did not’’. Therefore, this 
intermediate state could be a consequence of the use of K1f4 that is 
important for efficient reprogramming, and demonstrates that the 
reprogramming process is not simply a reversion of normal differenti- 
ation (summarized in Supplementary Fig. 1). It would be intriguing to 
investigate whether similar transient gene expression changes can be 
seen in reprogramming of ectoderm or endoderm lineages. Down- 
regulation of these epidermis genes coincided with upregulation of 
‘late’ pluripotency genes. Future examination of this rapid switch in 
gene expression may provide a new insight into the molecular mech- 
anism of reprogramming. 
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The integrative data analysis described above demonstrated that this 
CD44/ICAM1/Nanog-eGFP marker system could uniquely provide 
high-resolution information during late pluripotency gene upregula- 
tion, enabling the discrimination of ‘reprogramming’ from ‘expansion 
of reprogrammed cells’ (Fig. 3b and Supplementary Figs 16b and 17). 
This system also refines investigation of the kinetics of reprogram- 
ming. It has recently been shown that vitamin C increases reprogram- 
ming efficiency by facilitating histone 3 Lys 9 (H3K9) demethylation’, 
and that reprogramming factors fail to bind trimethylated H3K9-rich 
regions in the initial stages of reprogramming”. We carried out repro- 
gramming in the absence of vitamin C and observed not only a 
decrease in the iPS cell colony number, but also a marked delay in 
the transition from one stage of reprogramming to the next (Sup- 
plementary Fig. 18). Similar analyses can be performed using our 
marker system to investigate the mechanism of action of other factors 
that alter reprogramming efficiency. Isolation and analysis of sub- 
populations affected by these factors could reveal the downstream 
genes specifically involved in, and required for, successful reprogram- 
ming. Further studies using this high-resolution analysis system have 
the potential to make a considerable contribution towards revealing 
the molecular mechanisms of reprogramming. 


METHODS SUMMARY 


The vector PB-TAP IRI 2LMKOSim0O, a modified version of polycistronic repro- 
gramming vector pCAG2LMKOSim0O (ref. 12), containing insulator and replica- 
tor sequences and driven by the tetO promoter, was constructed as described in 
the Methods. This vector was used to generate iPS cell line D6s4B5 from reverse 
tetracycline transactivator (rtTA)-expressing MEFs carrying a Nanog-eGFP 
reporter’. D6s4B5 iPS cells were used to generate chimaeric embryos from which 
MEFs were isolated at embryonic day 12.5. Transgenic MEFs were cultured in 
doxycycline (300 ng ml '), vitamin C (10 ug ml ') and Alk inhibitor (500 nM), 
and collected for flow cytometry analysis (BD Fortessa), carried out using 
antibodies for CD44 and ICAM1 every 2-3 days. Cells were sorted (BD FACS 
Aria II) at day 10 or 15, and replated on gelatin for analysis at 24h, or at clonal 
density on irradiated MEFs for Nanog-eGFP* c.f.p. 10 days after cell sorting. All 
flow cytometry data were analysed using FlowJo (Tree Star). Immunofluorescence 
was carried out using confocal microscopy (Leica TSC SP2). RNA from sorted 
samples was extracted using Trizol (Invitrogen), and 10 ng total RNA was used for 
multiplexed RNA-sequencing”’. Data were analysed using GeneProf’, and 
DEGs were identified using edgeR and DESeq Bioconductor libraries”. Gene 
Ontology enrichment was calculated using DAVID”. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Vector construction. The piggyBac transposon PB-TAP containing the tetO, 
promoter, an attR1R2 Gateway cloning cassette (Invitrogen) and rabbit B-globin 
poly A signal, was provided by A. Nagy. To minimize silencing of the reprogram- 
ming vector, a chicken B-globin insulator*' was inserted into the Pacl site between 
the piggyBac 3’-terminal repeat (3’-TR) and the tetO, promoter, and a human 
lamin B2 (LMB2) replicator’ plus another chicken B-globin insulator were 
inserted into the EcoRV site between the rabbit B-globin poly A signal and the 
piggyBac 5'-TR, to generate PB-TAP IRI. The BamHI fragment containing 
loxP-flanked MKOS reprogramming cassette followed by ires-mOrange 
(2LMKOSimO) from pCAG2LMKOSimO (ref. 12) was inserted into a Gateway 
entry vector pENTR 2B (Life Technologies), to generate attP2LMKOSimO 
pENTR. Finally the attP2LMKOSimO cassette was Gateway-cloned into the 
PB-TAP IRI to yield reprogramming piggyBac transposon PB-TAP IRI 
attP2LMKOSimoO. Similarly, reprogramming piggyBac transposon PB-TAP IRI 
2LOSKMimO was generated after transferring the OSKM reprogramming 
cassette’” into attP2LMKOSimO pENTR replacing the MKOS cassette. Plasmid 
sequences are available on request. 

Generation of a primary iPS cell line D6s4B5. Embryos at 12.5 days post coitum 
(d.p.c.) were obtained from Rosa tat, Nanog’*!?’ * Collal*’* mice, which 
were derived by crossing TNG mice’? and B6;129-Gt(ROSA)26Sor'”™1 174 #M2Iae 
Colla t”2et0-Pousfee yy Jackson Laboratory). The embryos were decapitated, 
eviscerated, dissociated with 0.25% trypsin and 0.1% EDTA, and plated in MEF 
medium (GMEM, 10% FBS, penicillin-streptomycin, 1 non-essential amino 
acids (Invitrogen), 1mM sodium pyruvate and 0.05mM 2-mercaptoethanol). 
The PB-TAP IRI attP2LMKOSimO (500 ng) and pCyL43 piggyBac transposase 
expression vector™ (21g) were introduced into the MEFs by nucleofection 
(Amaxa) as before’’, and cells were cultured in ES cell medium (MEF medium 
supplemented with 1,000 U ml leukaemia inhibiting factor (LIF)) in the pres- 
ence of 1.0 1g ml * doxycycline (Sigma) for an initial 8 days, and thereafter 0.5 jig 
ml! doxycycline. Pluripotency of a clonal iPS cell line D6 was confirmed by 
teratoma formation, and a subclone D6s4B5 was used for secondary reprogram- 
ming. To compare CD44 and ICAM1 profiles of primary reprogramming with 
PB-TAP IRI attP2LMKOSimO and PB-TAP IRI 2LOSKMimO vectors, MEFs 
were nucleofected as above and cultured in the presence of 1.0 tg ml’ doxycy- 
cline, 10 pg ml ! vitamin C (Sigma) and 500 nM Alk inhibitor A 83-01 (TOCRIS 
Bioscience). 

Secondary reprogramming. Each chimaeric embryo was collected at 12.5 d.p.c., 
dissociated and cultured in MEF medium. One-twentieth of the dissociated cells 
were exposed to doxycycline (300ngml') for 2 days, and the proportion of 
transgenic MEFs was measured by FACS analysis of mOrange expression. For 
FACS time course and colony counting experiments, secondary transgenic MEFs 
were diluted to 5% and 30% by addition of 129 wild-type MEFs and plated in a 
gelatinized 6-well-plate at 1 x 10° cells per well (5,000 and 30,000 transgenic MEFs 
per well, respectively). For sorting experiments, MEFs were plated at 2 X 10° cells 
per gelatinized 100 mm plate (1 X 10* transgenic MEFs per plate). Cells were 
cultured in reprogramming medium, which is ES cell medium supplemented with 
300ngml ' doxycycline, 104gml~' vitamin C and 500nM Alk inhibitor. 
Medium was changed every 2 days. 

Flow cytometry and cell sorting. Cell-surface marker analysis was performed 
with the following eBioscience antibodies: ICAM-1-biotin (13-0541; 1/100), 
CD44-biotin (17-0441; 1/100), CD44- allophycocyanin (APC) (17-0441; 1/300), 
streptavidin-phycoerythrin (PE)-Cy7 (25-4317-82; 1/1500), SSEA-1-647 (51- 
8813; 1/50), E-cadherin-biotin (13-3249; 1/100), Thyl-APC (17-0902, 1/300) 
and CD2-biotin (13-0029; 1/100). For sorting experiments, dead cells were 
excluded using 4’,6-diamidino-2-phenylindole (DAPI) nucleic acid stain 
(Invitrogen) (0.5 ng ml '). Cells were incubated in 0.25% trypsin and 1mM 
EDTA (Life Technologies) for 1-2 min at 37 °C, collected in GMEM media con- 
taining 10% FCS and counted. Staining was carried out in FACS buffer (2% FCS in 
PBS) at ~1 X 10° cells ml! for 15-30 min at 4 °C, and followed by washing with 
FACS buffer, sorting and/or analysis with FACSArialII and LSRFortessa (both BD 
Biosciences), respectively. Excitation laser lines and filters used for each fluoro- 
phore are summarized in Supplementary Table 9. Data were analysed using 
FlowJo software (Tree Star). Intact cells were identified based on forward and side 
light scatter, and subsequently analysed for fluorescence intensity. Additional 
gating was carried out as outlined in Supplementary Fig. 2. For colony formation 
assays, sorted cells were plated on y-irradiated MEFs in 12-well plates at 3.5 X 10° 
cells per well. Nanog-eGFP* colonies were quantified 10 days after sorting. For 
24h or time-course analysis, sorted cells were plated in gelatinized 48-well plate at 
1 X 10* cells per well. In both cases, cells were cultured in reprogramming medium 
after sorting. 


Immunofluorescenceand confocal microscopy imaging. Images of cells stained 
with ICAM-1-biotin (1/100), CD44-APC (1/300) and streptavidin-PE-Cy7 (1/ 
1,500) antibodies described above were captured with a confocal microscope 
(Leica TSC SP2) and Leica confocal software. Cells stained with anti-Krtl7 
(LifeSpan BioSciences) and anti-Sfn (Sigma) antibodies and anti-Rabbit IgG 
CF633 secondary antibody (Sigma) were imaged with a fluorescence microscopy 
(Olympus). 

Multiplexed RNA sequencing and data analysis. RNA was isolated with TRI 
reagent (Sigma) following the manufacturer’s instructions. RNA quality and 
concentration was determined using the Agilent 2100 Bioanalyzer (Agilent 
Technologies). Using 10 ng RNA, reverse transcription with bar-coded primers, 
complementary DNA amplification, and sequencing with Illumina HiSeq 2000 
were performed as previously described””’. Quality control of the obtained reads 
and alignment to the mouse reference genome (NCBI37/mm9) were performed 
using the GeneProf web-based analysis suite with default parameters”. Gene 
expression read counts were exported and analysed in R to identify DEGs, using 
the edgeR and DESeq Bioconductor libraries” ~*. For both methods, low express- 
ion transcripts (less than 13 reads in all samples) were filtered out, and P values 
were adjusted using a threshold for false discovery rate (FDR) = 0.05. Genes listed 
as DEGs by both methods in any two subpopulation comparison indicated in 
Supplementary Table 1 and Supplementary Fig. 12a (total 3,171) were used for 
further analysis. Hierarchical clustering and K-means clustering (K = 5) was per- 
formed using Cluster 3.0, and Java Treeview was used for visualization***’. This 
multiplexed RNA-sequencing technology reads only the 5’ end of transcript, thus 
detecting only endogenous Oct4 and Sox2. Nanog expression was detectable in 
Nanog-eGFP populations owing to the reporter system. Principal components 
analysis was performed in R and plotted with the scatterplot3d library**. Gene 
Ontology enrichment was calculated using the DAVID functional annotation 
bioinformatics tool”*. Gene Ontology term enrichment analysis was carried out 
with a modified Fisher exact P value. The three additional published studies'** 
(GEO accession numbers GSE21757, GSE14012 and GSE42379) were analysed in 
a similar way. For the time course data, the analysis was performed as following: 
data were robust multi-array average (RMA)” normalized using the expression 
console from Affymetrix and, because no replicates were provided, fold changes 
between two samples were calculated in Excel. Genes with more than 1.5-fold 
changes were classified as DEGs. For the Plath and Polo data set, data were RMA- 
normalized using the ‘affy’ package** in R, and DEGs were identified using the 
‘limma’ package** in R with fold change = 1.5 and FDR = 0.05, or fold change 
= 1.5 where no replicates were available. Subsequently, K-means clustering of the 
identified DEGs was performed for all studies. Selected gene expression data are 
shown as the relative expression against the highest signal among the samples 
using an averaged signal value (reads per million) of duplicates/triplicates. 
Single-cell gene expression analysis. Single-cell qPCR was performed as 
described previously’ with slight modifications. In brief, 22 sets of TaqMan gene 
expression assays (Applied Biosystems; Supplementary Table 9) were pooled at a 
final concentration of 180 nM per primer set and 50 UM per probe. Individual cells 
were sorted directly into 10 pl RT-PreAmp Master Mix (5 ll of CellsDirect reac- 
tion mix (Invitrogen), 2.5 ul of pooled assays, 0.2 11 of SuperScript III (Invitrogen), 
1.3 pl of water) using FACSAria II. Cell lysis and sequence-specific reverse trans- 
cription were performed at 50 °C for 15 min. Reverse transcriptase was inactivated 
by heating to 95 °C for 2 min. Subsequently, in the same tube, cDNA went through 
sequence-specific amplification by denaturing at 95 °C for 15 s, and annealing and 
amplification at 60 °C for 4 min for 22 cycles. Preamplified products were diluted 
fivefold with water and analysed in 48.48 dynamic arrays on a biomark system 
(Fluidigm) following the Fluidigm protocol. C, values were calculated and visua- 
lized using BioMark real-time PCR analysis software (Fluidigm). Each assay was 
performed in replicate. 
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ZFP36L2 is required for self-renewal of early 
burst-forming unit erythroid progenitors 


Lingbo Zhang’*, Lina Prak', Violeta Rayon-Estrada't, Prathapan Thiru’, Johan Flygare'}, Bing Lim**** & Harvey F. Lodish'? 


Stem cells and progenitors in many lineages undergo self-renewing 
divisions, but the extracellular and intracellular proteins that regu- 
late this process are largely unknown. Glucocorticoids stimulate 
red blood cell formation by promoting self-renewal of early burst- 
forming unit-erythroid (BFU-E) progenitors’*. Here we show 
that the RNA-binding protein ZFP36L2 is a transcriptional target 
of the glucocorticoid receptor (GR) in BFU-Es and is required for 
BFU-E self-renewal. ZFP36L2 is normally downregulated during 
erythroid differentiation from the BFU-E stage, but its expression 
is maintained by all tested GR agonists that stimulate BFU-E self- 
renewal, and the GR binds to several potential enhancer regions of 
ZFP36L2. Knockdown of ZFP36L2 in cultured BFU-E cells did not 
affect the rate of cell division but disrupted glucocorticoid-induced 
BFU-E self-renewal, and knockdown of ZFP36L2 in transplanted 
erythroid progenitors prevented expansion of erythroid lineage 
progenitors normally seen following induction of anaemia by phe- 
nylhydrazine treatment. ZFP36L2 preferentially binds to messen- 
ger RNAs that are induced or maintained at high expression levels 
during terminal erythroid differentiation and negatively regulates 
their expression levels. ZFP36L2 therefore functions as part of a 
molecular switch promoting BFU-E self-renewal and a subsequent 
increase in the total numbers of colony-forming unit-erythroid 
(CFU-E) progenitors and erythroid cells that are generated. 

Humans generate 10" erythrocytes every day, a process regulated by 
multiple hormones affecting several types of progenitors. Apoptosis, 
proliferation and terminal differentiation of CFU-E erythroid pro- 
genitors are mainly controlled by erythropoietin (EPO)°*. In contrast, 
many hormones including EPO, stem cell factor (SCF), interleukin-3 
(IL-3) and interleukin-6 (IL-6) regulate the earlier BFU-E progenitors, 
but we do not know how they interact to control BFU-E quiescence, 
self-renewal divisions or cell divisions yielding the later CFU-E pro- 
genitors. Under stress conditions such as acute blood loss or chronic 
anaemia, glucocorticoids trigger self-renewal of BFU-E progenitors in 
the spleen, leading to increased numbers of self- renewal divisions. This 
results in increased BFU-E numbers and, over time, formation of 
increased numbers of CFU-E progenitors and subsequently of mature 
erythrocytes’ *”*. 

To identify GR-activated genes essential for BFU-E self-renewal, 
BFU-Es were purified’ and cultured in a medium (self-renewal medium) 
containing SCF, EPO, insulin-like growth factor 1 (IGF-1), and several 
full or dissociated GR agonists. All agonists, except for one dissociated 
agonist, stimulate BFU-E self-renewal (Supplementary Fig. 1). Because 
the genes upregulated by all functional agonists represent candidates 
indispensable for BFU-E self-renewal, we performed deep sequencing 
on mRNAs from BFU-Es cultured with GR agonists for 4h, and iden- 
tified a group of genes upregulated by all functional agonists but not 
by nonfunctional agonists (Supplementary Table 1). We focused on 
three genes normally downregulated during erythroid differentiation, 
Zfp3612, Hopx and Nirp6 (Supplementary Fig. 2, b, c). As detailed later, 


knockdown of Hopx and Nirpé6 resulted in a defect in BFU-E prolifera- 
tion. In contrast, knockdown of Zfp36l2, the most abundant transcript 
upregulated by GR agonists, did not affect the initial rate of BFU-E 
division. 

During erythroid differentiation in vivo, ZFP36L2 is downregulated 
from the BFU-E stage (Fig. 1a, b). BFU-Es cultured in vitro for 4h 
with all functional GR agonists showed a ~2.5-fold upregulation of 
ZFP36L2 that was maintained throughout the culture (Fig. 1c-e). 
Given that ZFP36L2 is upregulated after only 4h, ZFP36L2 is likely 
a direct transcriptional target of the GR. Thus we performed GR chro- 
matin immunoprecipitation sequencing (ChIP-seq) on freshly isolated 
BFU-Es after 1h stimulation with dexamethasone (DEX) and iden- 
tified five GR binding sites near the Zfp36l2 transcription start site 
(TSS), potential enhancers of Zfp3612 (Supplementary Fig. 3). Four 
of these sites responded to DEX, indicating that they are functional 
enhancers regulated by glucocorticoids (Fig. 1g). Together, these 
results indicate that ZFP36L2 is a direct transcriptional target of the 
GR in BFU-Es. 

ZFP36L2 belongs to an RNA-binding protein family’. On the basis 
of RNA-seq gene expression data from purified BFU-E, CFU-E and 
Terl19-positive (Terl 19°, also known as LY76) erythroblasts’, 
ZFP36L2 is ~20 times more abundant in erythroid progenitors than 
its other two family members, ZFP36 and ZFP36L1, and is the only 
member upregulated by DEX (Fig. 1f and Supplementary Fig. 4), 
indicating that ZFP36L2 is the major family member involved in regu- 
lation of BFU-E self-renewal. Furthermore, ZFP36L2 is gradually 
downregulated from the haematopoietic stem cell (HSC) to the early 
and late erythroid progenitor stages (Supplementary Fig. 5). In sum- 
mary, glucocorticoid treatment of BFU-Es reverses normal downre- 
gulation of ZFP36L2, correlating with glucocorticoid-induced BFU-E 
self-renewal. 

To test whether upregulation of ZFP36L2 is required for glucocorticoid- 
induced BFU-E self-renewal, we used two short hairpin RNAs (shRNAs) 
to knock down expression of ZFP36L2 in BFU-Es (Fig. 2a, b). BFU-Es 
cultured without DEX stop proliferating at 4 days; as shown previously’, 
in the absence of DEX, each BFU-E generates several CFU-Es, each of 
which generates 10-30 erythroid cells over a 5 to 6 day period. In con- 
trast, BFU-Es cultured in the presence of DEX continue to proliferate 
and generate over 10 times more mature erythroid cells at 9 days of 
culture. As shown previously’, in the presence of DEX, each BFU-E 
generates multiple daughter BFU-Es during the first days of culture; 
over time these BFU-Es generate increased numbers of daughter 
CFU-Es, each of which generates 10-30 erythroid cells. Importantly, 
BFU-Es expressing either Zfp36/2 shRNA stop proliferating at day 4, 
whether or not DEX is included; the proliferation kinetics are similar to 
those of BFU-Es cultured without DEX (Fig. 2c). This indicates that 
knockdown of ZFP36L2 disrupts DEX-induced self-renewal of BFU-Es. 

As one control, knockdown of c-Kit, a cell surface receptor required 
for the survival of haematopoietic stem and progenitor cells including 
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Figure 1 | Normal downregulation of ZFP36L2 during erythroid 
differentiation from the BFU-E stage is reversed by functional GR agonists. 
a, The expression levels of Zfp3612 mRNA in BFU-Es, CFU-Es, and Terl 197 
erythroblasts were measured by RNA-seq. b, The expression levels of Zfp3612 
mRNA in BFU-Es and CFU-Es were measured by RT-PCR normalized to 18S 
ribosomal RNA (rRNA). Error bar represents standard deviation (s.d.) (m = 3). 
c, The expression levels of Zfp3612 mRNA in BFU-Es after 4h culture in self- 
renewal medium with indicated GR agonists were measured by RNA-seq. 

d, The expression levels of Zfp3612 mRNA in BFU-Es after 3 days culture in 
self-renewal medium with or without DEX were measured by RT-PCR 
normalized to 18S rRNA. Error bar represents s.d. (n = 3). e, The expression 
levels of ZFP36L2 of samples shown in panel d were measured by western blot. 
Representative data are shown (n = 3). f, The expression levels of Zfp36, 
Zfp36l1 and Zfp3612 mRNAs in BFU-Es, CFU-Es and Ter119* erythroblasts 
were measured by RNA-seq'; data shown are relative RPKM (reads per kilobase 
per million mapped reads) values normalized to the expression level of Zfp36 in 
Terl19* erythroblasts. g, Luciferase reporter vectors cloned with each GR 
binding site or empty vector were co-transfected with XZ-GR into 293T cells 
and cultured in medium containing 1 |tM DEX. Luciferase activities were 
measured 2 days later. Error bar represents s.d. (n = 3). All P values were 
calculated using the two-tailed t-test. 


BFU-Es, resulted in a blockage of BFU-E proliferation after only 1 day 
of culture (Supplementary Fig. 2a). HOPX and NLRP6 share similar 
expression patterns as ZFP36L2, are downregulated during erythroid 
differentiation from the BFU-E stage and upregulated by DEX, and 
possess promoter regions that, based on our GR ChIP-Seq data, are 
occupied by GR. Knockdown of these also resulted in a blockage 
of BFU-E proliferation after only 1 day of culture (Supplementary 
Fig. 2b, c). 

To establish that ZFP36L2 is specifically required for BFU-E self- 
renewal, 3-day BFU-E cultures were tested for their number of daugh- 
ter BFU-Es by colony assays. Knockdown of ZFP36L2 significantly 
decreased the number of BFU-Es formed in the presence of DEX 
(Fig. 2d). Knockdown of ZFP36L2 had no influence on apoptosis of 
BFU-Es (Fig. 2e) and, as expected on basis of its low level of expression 
in CFU-Es, knockdown of ZFP36L2 had no effect on erythroid dif- 
ferentiation beyond the CFU-E stage (Supplementary Fig. 6). These 


P=0.005 


4.24 P=9.35 x 107” 
1.0 


LETTER 
0.8 


———a 
1,000 ~~ Control shRNA 
rr E (+DEX) 
Control shRNA 
(-DEX) 
© Zfp36l2 shRNA 
0.4 (+DEX) 
™ Zfp36l2 snRNA2 
0 1 (+DEX) 


Relative level of 
Zfp36/2 mRNA 
3 


0.6 
0.2 


Fold cell expansion @ 
a 
i=! 
=] 


se RZ 0123456789 
Ko BSN xp In vitro culture days 
Se wo KO 
1~) 4 Ke 
v v 
b 5 100 > Control shRNA 
Sa zFrsoiz 5 (-DEX) 
S = Zfp3612 shRNA1 
N = ~~ Zfp3612 shRNA2 
es we Ria 8 (DEX) 
SA Sp zB 
SSVIE VE 2 
FH WG GW 1 


x 
oO 
oO Wek Ae 01234567 
In vitro culture days 
P=0.046 e 
35 2 
9 30 P=0.045 3 
© 5 25 Ss = Control shRNA 
& 8 20 5 (+DEX) 
Sy 15 g ® Z/p36/2 shRNA1 
2? 10 sg (+DEX) 
mo 5 3 § Zfp3612 shRNA2 
a Boo (+DEX) 
Day3 Day4 Day5 
S = a ay 3 Day 4 Day 
SPrpiP iP 
SNM SHS MS 
PIN RI BR 
SF “ae 69 


Figure 2 | ZFP36L2 is specifically required for BFU-E self-renewal. a, The 
expression levels of Zfp36l12 mRNA in BFU-Es infected with viruses encoding 
indicated shRNAs followed with 1 day culture in self-renewal medium with 
DEX were measured by RT-PCR normalized to 18S rRNA. Error bar represents 
s.d. (n = 3). b, The expression levels of ZFP36L2 in BFU-Es infected with 
viruses encoding indicated shRNAs followed by 3 days culture in self-renewal 
medium with DEX were measured by western blot. Representative data are 
shown (n = 3). c, BFU-Es were infected with viruses encoding indicated 
shRNAs and cultured in self-renewal medium with or without DEX. Relative 
cell numbers throughout the culture are shown. Error bar represents s.d. 

(n = 3). d, Day 3 cells from this in vitro BFU-E culture system were plated in 
methylcellulose medium; BFU-E colonies were counted 9 days later. Error bar 
represents s.d. (n = 3). e, Day 3, 4 and 5 cells from this in vitro BFU-E culture 
system were stained with fluorophore-conjugated annexin V and 
7-aminoactinomycin D (7-AAD) and the percentages of double-negative live 
cells are shown. Error bar represents s.d. (n = 3). All P values were calculated 
using the two-tailed t-test. 


data indicate that ZFP36L2 is specifically required for glucocorticoid- 
induced BFU-E self-renewal. 

Glucocorticoids and GR are required for erythroid lineage cell 
expansion in the spleen during stress erythropoiesis**. The data in 
Fig. 3, using a phenylhydrazine (PHZ)-induced haemolytic anaemia 
mouse model, shows that ZFP36L2 is required for stress-induced 
erythroid lineage expansion in vivo. Lineage-negative (Lin) cells were 
isolated and infected with viruses encoding green fluorescence protein 
(GFP) and either a control shRNA or Zfp3612 shRNAs, and then 
transplanted into lethally irradiated recipient mice. Six to eight weeks 
after transplantation, PHZ or control phosphate buffered saline (PBS) 
was injected intraperitoneally into transplanted mice at days 0 and 1 to 
induce haemolytic anaemia and erythroid lineage expansion. On day 4, 
spleens were dissected for flow cytometry analysis for multiple haema- 
topoietic cells, detected by lineage specific markers (Fig. 3a). Zfp3612 
shRNAs effectively knocked down the expression of ZFP36L2 in the 
Lin” cells (Supplementary Fig. 7a), and as expected, injection of PHZ 
resulted in a ~10-fold increase in systemic glucocorticoid levels (Fig. 3b). 

In control mice transplanted with Lin” cells infected with control 
shRNA, the majority of control transplanted GFP* splenic cells were 
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Figure 3 | ZFP36L2 is required for erythroid lineage expansion during 
stress erythropoiesis in vivo. a, Schematic diagram shows the in vivo bone 
marrow transplantation and PHZ-induced haemolytic anaemia mouse model. 
b, Corticosterone levels were measured in mouse plasma 1 h after PBS or PHZ 
injection. Error bar represents s.d. (n = 3). c, Normalized numbers of each type 
of GFP * haematopoietic lineage cells were calculated as a ratio of the number of 
each type relative to the percentages of GFP™ cells in the Lin” population 
before transplantation. The normalized numbers of GFP* Ter119* cells are 
shown. d, The normalized numbers of GFP CD11b* cells are shown. All P 
values were calculated using the two-tailed t-test. 


Terl19-negative (Terl19 ) non-erythroid lineage cells as expected 
(Supplementary Figs 8 and 9a). PHZ-mediated haemolysis induced 
a ~20-fold expansion of number of erythroid lineage cells (Fig. 3c) and 
resulted in ~50% of GFP” cells in the spleen becoming Terl19* 
mature erythroid cells (Supplementary Fig. 9a). 

Importantly, knockdown of ZFP36L2 significantly impaired this 
erythroid lineage expansion (Fig. 3c). In addition, the effects of 
ZFP36L2 in mediating haemolysis-induced cell expansion is specific 


to the erythroid lineage, as no other haematopoietic lineages showed a 
difference in the number of donor-derived GFP* cells between PBS 
and PHZ injection groups, with or without ZFP36L2 knockdown 
(Fig. 3d and Supplementary Figs 10a, b and 11d-f). Consistent with 
the increase in the percentage of Ter119* erythroid lineage cells in 
control mice upon PHZ injection, the percentages of other haema- 
topoietic lineages were decreased upon PHZ injection, and these 
decreases were eliminated in the absence of ZFP36L2 (Supplemen- 
tary Figs 9a—d and 11a-c). 

ZFP36L2 homozygous knockout mice die from HSC failure within 
2 weeks after birth'®, and thus this protein is likely required in early 
haematopoietic stem and progenitor cells. Consistent with this notion, 
before PHZ challenge, the percentage of GFP” splenic cells is lower in 
mice transplanted with Lin cells infected with viruses encoding either 
Zfp3612 shRNA than with the control shRNA; this difference is not 
caused by significant differences in infection efficiency of Lin cells 
before transplantation (Supplementary Fig. 7b, c). 

Importantly, in mice transplanted with Lin’ cells expressing the 
control shRNA the percentage of GFP* cells in the spleen does not 
significantly change after PHZ treatment, whereas in mice transplanted 
with Lin” cells infected with viruses encoding Zfp3612 shRNAs, the 
percentage markedly drops following PHZ treatment (Supplemen- 
tary Fig. 7d), consistent with the loss of erythroid cell expansion 
(Fig. 3c). Together, these data indicate that ZFP36L2 is specifically 
required for erythroid expansion during stress erythropoiesis in vivo 
and is consistent with the notion that it is essential for glucocorticoid- 
induced BFU-E self-renewal. 

The data in Fig. 4 show that ZFP36L2 contributes to BFU-E self- 
renewal by repressing expression of genes important for terminal eryth- 
roid differentiation. BFU-Es were cultured in self-renewal medium 
with or without DEX. At day 4, approximately 12% of the cells gene- 
rated from BFU-Es cultured with DEX differentiated into Ter119° 
cells, whereas ~35% of the progeny of BFU-Es cultured without 
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Figure 4 | ZFP36L2 delays erythroid differentiation and preferentially 
binds to several mRNAs that are induced or maintained at higher expression 
levels during terminal erythroid differentiation. a, BFU-Es were infected 
with viruses encoding the indicated shRNAs and cultured for 4 days in self- 
renewal medium with or without DEX. The fraction of Ter119™ cells was 
measured. The normalized fraction of Ter119* cells was calculated as a ratio of 
the fraction of Ter119~ cells in cultures of BFU-Es infected with viruses 
encoding the indicated shRNAs and cultured under the indicated culture 
conditions relative to the fraction of Ter119* cells in cultures of BFU-Es 
infected with control virus and cultured with DEX. Error bar represents s.d. 
(n = 3). Pvalues were calculated using the two-tailed t-test. b, Microarrays were 
performed on BFU-Es infected with viruses encoding indicated shRNAs and 
cultured for 3 days in self-renewal medium with or without DEX. The x axis 
represents the relative expression of each gene calculated as a log, ratio of its 
expression in the indicated samples. The cumulative fraction (y axis) is plotted 
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Control shRNA (+DEX) CFU-E/BFU-E (log,) 


as a function of the relative expression (x axis). “All genes’ includes all of the 
genes in the microarray; ‘Induced genes’ represent a group of 340 genes most 
highly induced during erythroid differentiation from the CFU-E to the 
Terl19* erythroblast stage. P values were calculated using the Kolmogorov- 
Smirnov test. c, Genes that are downregulated by at least 20% in the presence of 
DEX and the subset of these genes that are upregulated by at least 20% upon 
knockdown of ZFP36L2 are shown through the same microarrays as those 
shown in b. d, The x axis represents the relative expression of each gene 
calculated as a log, ratio of its expression in CFU-Es relative to BFU-Es. The 
cumulative fraction (y axis) is plotted as a function of the relative expression 
(x axis). “Targets’ are the 2,000 microarray probes corresponding to mRNAs 
most preferentially found in the ZFP36L2 immunoprecipitate from the RIP- 
chip assay. “Non-targets’ represent all remaining genes. P value was calculated 
using the Kolmogorov-Smirnov test. 
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DEX became Terl19*, consistent with a DEX-triggered delay in ter- 
minal erythroid differentiation. This DEX-induced differentiation 
delay was eliminated by ZFP36L2 knockdown; loss of ZFP36L2 results 
in formation of ~30% Ter119* cells, similar to the percentage in cul- 
tures without DEX (Fig. 4a), indicating that ZFP36L2 is essential for 
the glucocorticoid-induced delay of erythroid differentiation from the 
BFU-E stage. Consistent with a differentiation delay evidenced by the 
Terl19 marker, DEX treatment globally repressed the expression of a 
group of genes most highly induced during erythroid differentiation, 
and knockdown of ZFP36L2 eliminated this repression (Fig. 4b). 

This conclusion is further strengthened by experiments showing 
that overexpression of ZFP36L2 in BFU-Es significantly reduces the 
rate of cell proliferation (Supplementary Fig. 12a, b), but also delays 
erythroid differentiation. When cultured 4 days in the absence of DEX, 
the fraction of differentiated Ter119* cells was significantly lower in 
the overexpression cells than in the controls, and was similar to that 
from control BFU-Es cultured with DEX (Supplementary Fig. 12c). 
Although overexpression of ZFP36L2 led to a delay in erythroid dif- 
ferentiation, overexpression of ZFP36L2 in BFU-Es cultured without 
DEX was not able to rescue the self-renewal divisions, indicating that 
ZFP36L2 contributes to BFU-E self-renewal by delaying erythroid 
differentiation (Supplementary Fig. 12b). 

To identify mRNAs in BFU-Es that directly bind to ZFP36L2, we 
performed an RNA-binding protein immunoprecipitation, using a 
verified ZFP36L2-specific antibody (Supplementary Fig. 13a, b), 
coupled to a microarray (RIP-chip) assay and identified many genes 
specifically immunoprecipitated by the ZFP36L2 antibody (Sup- 
plementary Table 2). Using this unbiased genomic approach, we 
showed that mRNAs containing AU-rich elements in their 3’ untrans- 
lated regions (UTRs) are preferentially incorporated into the anti- 
ZFP36L2 immunoprecipitate (Supplementary Fig. 14); ~72% of these 
mRNAs indeed contain the core ZFP36L2 recognition motif ATTTA 
element in their 3'UTRs. This is consistent with previous reports 
concerning the binding specificity of ZFP36L2''”, but we cannot 
eliminate the possibility that some mRNAs are binding to other 
unknown proteins in complexes with ZFP36L2. Strikingly, we found 
that in BFU-Es, ZFP36L2 preferentially binds to mRNAs that tend to 
be induced or maintained at higher than average expression levels 
during subsequent erythroid differentiation to the CFU-E stage 
(Fig. 4d). In addition, we found that the number of AU-rich elements 
of mRNAs bound by ZFP36L2 positively correlates with their extent of 
induction during erythroid differentiation from the BFU-E to CFU-E 
stage (Supplementary Fig. 15a, b). Thus, the expression pattern of 
ZFP36L2 negatively correlates with the expression pattern of erythroid 
differentiation-induced genes. 

To globally identify functional targets whose expression levels are 
regulated by ZFP36L2, we analysed the gene expression profile of day 3 
in vitro cultured BFU-Es with microarrays, allowing us to identify 
several genes with lower expression levels in BFU-Es cultured with 
DEX, compared with their counterparts in BFU-Es cultured without 
DEX (Fig. 4c). Repression of several of these DEX-repressed genes is 
dependent on and presumably mediated by ZFP36L2, because knock- 
down of ZFP36L2 eliminated this repression (Fig. 4c). A group of 
potential ZFP36L2 functional target genes was then identified by inter- 
secting the set of these repressed genes with the group of genes iden- 
tified by RIP-chip assay (Supplementary Fig. 16 and Supplementary 
Table 3). As shown in Supplementary Table 4, this group of potential 
functional targets contains several genes previously known to be 
important for or related to terminal erythropoiesis, including Affl”, 
Mafk"*, Nfe2l1°, Sap30l°, Epb4.1'’, Adar'*, Mthfd2"? and Mfhas1”. 
We performed luciferase reporter assays on selected candidates, and 
found that the 3’UTRs of many of these genes are ZFP36L2 responsive 
(Supplementary Table 4). In addition, we mutated AU-rich elements 
in the 3’UTR of Affl and observed a statistically significant 21% 
increase (P = 0.015) of luciferase activity, indicating that AU-rich ele- 
ments are required for this regulation. 
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These data illustrate that ZFP36L2 globally negatively regulates the 
expression of several erythroid differentiation-induced genes, among 
which some are known to be required for erythroid differentiation 
and others are unknown but induced. Regulation by ZFP36L2 seems 
similar to microRNA-mediated post-transcriptional regulation. An 
individual microRNA binds to multiple target mRNAs and down- 
regulates their expression levels often by only 20-50%; collectively, 
however, these modulations can have important biological effects. 
ZFP36L2 is normally downregulated during erythroid differentiation 
from the BFU-E stage, thus stabilizing many mRNAs required for 
terminal differentiation. ZFP36L2 transcription is enhanced by gluco- 
corticoids under stress conditions that signal erythroid lineage cell 
expansion. Upregulation of ZFP36L2 in turn negatively regulates 
multiple differentiation-induced genes, causing a delay in erythroid 
differentiation and ultimately contributing to BFU-E self-renewal 
(Supplementary Fig. 17). Altogether, our experiments uncover a novel 
mechanism that facilitates progenitor self-renewal: delaying differenti- 
ation by post-transcriptional downregulation of expression of mRNAs 
critical for progression to the next differentiation stage. 


METHODS SUMMARY 


BFU-Es were isolated from embryonic day 14.5 (E14.5) mouse fetal liver’, infected 
with viruses encoding GFP and either control shRNA or Zfp36l2 shRNAs, and 
cultured in self-renewal medium containing SCF, EPO and IGF-1 with or without 
DEX, and the cell numbers were counted daily throughout the in vitro culture. 
Lin cells were isolated from mouse E14.5 fetal liver and infected with viruses 
encoding GFP and either control shRNA or Zfp3612 shRNAs and transplanted into 
lethally irradiated recipient mice. Six to eight weeks after transplantation, the 
recipient mice were intraperitoneally injected with 60 mg per kg PHZ on day 0 
and day 1. On day 4, spleens were dissected and measured for multiple types of 
GFP * haematopoietic cells, detected by the haematopoietic lineage specific markers. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Primary BFU-Es and CFU-Es purification, retrovirus infection and in vitro 
culture. Primary BFU-Es and CFU-Es were purified from mouse E14.5 fetal 
liver’. BFU-Es were then placed in virus solution in a six-well plate, followed by 
37 °C overnight incubation. After incubation, virus solution was substituted by a 
medium (self-renewal medium) containing SCF (100 ng ml~!), EPO (2Uml !) 
and IGF-1 (40 ng ml *) with or without full and partial GR agonists. The cells were 
then in vitro cultured at 37 °C for 9 days. 

For CFU-E differentiation assay, CFU-Es were placed in virus solution, fol- 
lowed by 37 °C spin infection. After infection, virus solution was substituted by an 
EPO containing differentiation medium, and the cells were then in vitro cultured 
for 2 days”. 

Cell number counting assay and BFU-E colony formation assay. In the BFU-E 
culture system, after infection with viruses encoding GFP and either control 
shRNA or Zfp36l2 shRNAs, the absolute numbers of GEP* cells were counted 
daily by flow cytometry, where the counting beads were used as an internal 
counting standard. For BFU-E colony formation assay, after 3 days culture, 
GEP* cells were sorted by flow cytometry and cultured in methylcellulose medium 
(MethoCult SF M3436 from StemCell Technologies), and the number of BFU-E 
colonies containing a cluster of more than 20 CFU-E colonies were counted 9 days 
after culture. 

In vivo PHZ induced haemolytic anaemia and bone marrow transplantation 
mouse model. Lin cells were isolated from mouse E14.5 fetal liver and infected 
with viruses encoding GFP and either control shRNA or Zfp36l2 shRNAs and 
transplanted into lethally irradiated recipient mice. Six to eight weeks after trans- 
plantation, the recipient mice were intraperitoneally injected with 60 mg per kg 
PHZ or control PBS on day 0 and day 1. On day 4, spleens were dissected and 
measured for number and percentage of GFP” cells and of each type of GFP* 
haematopoietic lineage cell. 

Measurement of corticosterone level. Mice were injected with PHZ or PBS, and 
plasma were prepared 1 h after injection. Corticosterone levels were measured by 
using an ELISA kit according to instruction by the manufacturer Immuno- 
diagnostic Systems. 

RIP-chip and data analysis. RIP-chip was carried out according to the instruction 
of EZ-Magna RIP RNA-Binding Protein Immunoprecipitation Kit (Millipore). 
BFU-Es were lysed and incubated with ZFP36L2 antibody (Abcam ab70775; 5 1g) 
or control IgG (5 1g) conjugated with magnetic beads (50 ul) for 4h. The beads, protein 
and mRNA complexes were immunoprecipitated and magnetically separated. The 
mRNAs were purified and analysed by microarray (Affymetrix mouse gene 1.0 ST 
array). Ratios of each microarray probe between its intensity in ZFP36L2 antibody 
immunoprecipitated sample and its intensity in IgG-immunoprecipitated sample 
were calculated. 

For the AU-rich element enrichment analysis, the 2,000 microarray probes 
corresponding to the mRNAs most preferentially bound by ZFP36L2 antibody 
are listed as ‘Targets’. The 2,000 microarray probes corresponding to the mRNAs 
most preferentially bound by control IgG are listed as “Non-targets’. The percen- 
tages of targets and non-targets containing indicated AU-rich elements in their 
3'UTR were calculated. 

For the cumulative distribution plot, “Targets’ represents the 2,000 microarray 
probes corresponding to the mRNAs most preferentially bound by ZFP36L2 
antibody in RIP-chip experiment. ‘Non-targets’ represents all remaining genes. 
The relative expressions of each gene calculated as a log, ratio between its intensity 
in CFU-E and its intensity in BFU-E! were calculated (x axis). The cumulative 
fraction is plotted as a function of the relative expression (y axis). 

For the correlation analysis, the 2,000 transcripts that are most preferentially 
found in the ZFP36L2 immunoprecipitate in the RIP-chip experiments were 
ranked based on their relative expression levels calculated as a ratio of their 
expression levels in CFU-Es relative to BFU-Es, and were classified into 8 groups 
each with 250 transcripts based on their relative expression level ranking. One-way 
analysis of variance (ANOVA) analysis was performed to test the statistical sig- 
nificance of the difference among the average numbers of ATTTA elements in the 
3'UTR of each gene of these 8 groups by using GraphPad software. Test for linear 
trend after ANOVA was performed to test the statistical significance of the linear 
trend, the systematic increase of the average number of ATTTA elements in the 
3'UTR of each gene of these 8 groups as the rank of average relative expression 
levels of these 8 group increases, using GraphPad software. The average number of 
ATTTA elements in the 3’ UTR of each gene was plotted together with the average 
relative expression levels of each group for these 8 groups. The percentage of genes 
with ATTTA elements in their 3’UTRs was plotted together with the average 
relative expression levels of each group for these 8 groups. The linear trend lines 
were drawn and the P values and 7” values of the Pearson correlation test were 
calculated for these two plots. 
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GR Chip-seq. A total of ~7 X 10’ primary BFU-E cells were purified from mouse 
E14.5 fetal liver'. Cells were incubated at 37°C for 4h in SFEM (Stem Span) 
medium containing SCF (100 ng ml '), EPO (2Uml ') and IGF-1 (40 ng ml '); 
followed by 1h stimulation with 100 nM dexamethasone. Cells were then chem- 
ically crosslinked with 1% formaldehyde solution for 15 min at room temperature, 
lysed and sonicated to solubilize and shear crosslinked DNA in sonication buffer 
(50mM HEPES pH7.5, 40mM NaCl, 1mM EDTA, 1mM EGTA, 1% Triton 
X-100, 0.1% Na-deoxycholate and 0.1% SDS). The glucocorticoid receptor (GR) 
was immunoprecipitated overnight at 4°C with 10 tg of a combination of two 
antibodies bound to magnetic beads (Dynabeads, Invitrogen). The antibodies used 
were mouse monoclonal FiGR (sc-12763) and MAI-510 (BuGR clone, Thermo 
Scientific). The beads containing the GR bound to DNA were washed once with 
low salt buffer (20mM Tris pH 8, 2mM EDTA, 0.1% SDS, 1% Triton X100, 
150mM NaCl), once with high salt buffer (20mM Tris pH8, 2mM EDTA, 
0.1% SDS, 1% Triton X100, 500mM NaCl), and once with LiCl buffer (10 mM 
Tris pH 8, 1 mM EDTA, 1% NaDOC, 1% NP40, 150 mM LiCl). Bound complexes 
were eluted from the beads by heating at 65 °C overnight in elution buffer (50 mM 
HEPES pH 8, 10mM EDTA, 200mM EDTA and 1% SDS). Whole-cell extract 
DNA was also treated for crosslink reversal and was used as a background control. 
Immunoprecipitated DNA and whole-cell extract DNA were then purified by 
treatment with RNase A, proteinase K and a phenol:chloroform:isoamyl alcohol 
extraction. 

Purified DNA was prepared for sequencing according to a modified version of 
the Solexa Genomic DNA protocol. Fragmented DNA was end-repaired and 
adapters were ligated. An additional gel extraction step was added to the 
Illumina protocol at this step, allowing us to collect the material between 100 
and 300bp. The purified DNA was subjected to 18 cycles of linker-mediated 
PCR as per the Illumina protocol. Amplified fragments between 200 and 300 bp 
were isolated by agarose gel electrophoresis and purified. High-quality samples 
were confirmed by the appearance of a smooth smear of fragments from 100 to 
1,000 bp, with a peak distribution between 150 and 300 bp. 

Sequence reads were aligned to the mouse genome (NCBI Build 37, version 
mm49) using the model-based analysis of ChIP-Seq (MACS). Sequences uniquely 
mapping to the genome with zero or one mismatch were used in further analysis. 
Genomic bins with a normalized ChIP-Seq density greater than a defined thresh- 
old were considered enriched or “bound,” based on a P value of less than 107°. 
Luciferase reporter assay. 293T cells were seeded into 96-well plates 24h before 
transfection. For ZFP36L2 enhancer experiment, 10 ng luciferase reporter plasmid 
or control empty vector plasmid were co-transfected with 5 ng GR into 293T cells 
by using Lipofectamine 2000 transfection reagent (Invitrogen). Cells were cultured 
in a medium containing 1 1M DEX and lysed 48h after transfection. For 3’ UTR 
experiment, 10 ng luciferase reporter plasmid or control vector plasmid were co- 
transfected with 150 ng of XZ-ZFP36L2 into 293T cells by using Lipofectamine 
2000 transfection reagent (Invitrogen) followed with 48 h culture. Luciferase acti- 
vities were detected by using a dual luciferase kit (Promega). 

RNA-seq, microarray, and (RT-PCR. For RNA-seq, samples were prepared by 
using the RNA Sample Prep Kit (Illumina) and sequenced by using Illumina 
genome analyser at Whitehead Institute. 

For microarray experiments, RNA was extracted by using a miRNeasy Mini kit 
(Qiagen), and microarrays were performed by using the Mouse GE 4x44k micro- 
array (Agilent) at Whitehead Institute. 

For RT-PCR, RNA was extracted by using a miRNeasy Mini kit (Qiagen). 
Reverse transcription was carried out using SuperScript II Reverse Transcriptase 
(Invitrogen). Real-time PCR was performed by using SYBR Green PCR Master 
Mix (Applied Biosystems) and 7500 Real-Time PCR System (Applied Biosystems). 
The following primer sequences were used for real-time PCR: Zfp36/2, forward, 
GGCCGCACAAGCACAAGC, reverse, GAGACTCGAACCAAGATGAATAACG, 
Affl, forward, GCCTAACACTTCCTCCTGACACA, reverse, CTGCCTACAGC 
CCAAAGTCAA; Mafk, forward, GCGGCGCACACTCAAGA, reverse, TTTCT 
GTGTCACACGCTTGATG; Nef2l1, forward, CCCCAGAAGGCCTTTGTAACT, 
reverse, TCCAAGAGCATCTTCCCTTCA. 

Western blot. Protein was extracted in lysis buffer (150mM _ sodium chloride; 
1.0% NP-40 or Triton X-100; 0.5% sodium deoxycholate; 0.1% SDS; 50 mM Tris, 
pH 8). SDS-PAGE was performed using the NuPAGE Novex Bis-Tris Gel Systems 
(Invitrogen). After electrophoretic transfer, the PVDF membranes with protein 
were incubated with the first antibody for ZFP36L2 (Abcam ab70775; at dilution of 
1:1,000) or with the first antibody for GFP (Abcam ab290; at dilution of 1:1,000) at 
4°C overnight. After washing and incubating with HRP conjugated secondary 
antibody at room temperature for 1h, membranes were developed. 

Immunoprecipitation experiment. The XZ-ZFP36L2-GFP construct encoding 
the GFP-ZFP36L2 fusion protein was transfected into 293T cells. Two days after 
transfection, cells were lysed and immunoprecipitations were performed using 
either control IgG or ZFP36L2 antibody (Abcam ab70775) with the same amount 
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of input cell lysate according to the instructions for the EZ-Magna RIP RNA- 
Binding Protein Immunoprecipitation Kit (Millipore). Briefly, 293T cells were 
lysed and incubated with control IgG (541g) or ZFP36L2 antibody (Abcam 
ab70775; 541g) conjugated with magnetic beads (50) for 4h. The beads and 
proteins complexes were immunoprecipitated and magnetically separated. The pro- 
teins were purified and analysed by western blot using the ZFP36L2 antibody 
(Abcam ab70775). 

Plasmids. The shRNA sequences targeting mouse Zfp36l2, c-Kit, Hopx and NIrp6 
from Broad Institute RNAi consortium shRNA library were cloned into the 
MSCV-GFP vector. The shRNAs sequences are: Zfp3612, shRNA1, aaaaCCAA 
ACACTTAGGTCTCAGATgtcgacATCTGAGACCTAAGTGTTTGG; shRNA2, 
aaaaGCACCACAACTCAATATGAA AgtcgacTTTCATATTGAGTTGTGGTGC; 
c-Kit, shRNA1, aaaaCGGCTAACAAAGGGAAGGATT gtcgacAA TCCTTCCCTT 
TGTTAGCCG, shRNA2, aaaaCGGATCACAAAGATTTGCGATgtcgacA TCGC 
AAATCTTTGTGATCCG; Hopx, shRNA1, aaaaGCAGACGCAGAAATGGTTT 
AAgtcgacT TAAACCATTTCTGCGTCTGC, shRNA2, aaaaAGTACAACTTCAA 
CAAGGTCAgtcgacTGACCTTGTTGAAGTTGTACT, shRNA3, aaaaCCTTCGG 
AATGCAGATCTGTTegtcgacAACAGATCTGCATTCCGAAGG; Nirp6, shRNA1, 
aaaaGACCTCCAAGAGGTGATCAAT gtcgacATTGATCACCTCTTGGAGGTC, 
shRNA2, aaaaCTGGATCATCATAAAGCACA AgtcgacTTGTGCTTTATGATG 
ATCCAG. Sequences from mouse Affl 3'UTR (positions 5318-6758, RefSeq 
NM_001080798) that are 1,441 bp in length containing 2 ‘ATTTA’ motifs were 
PCR-amplified from mouse genomic DNA and cloned into the luciferase reporter 
vector psiCHECK2. Following are the primers used for PCR amplification: forward, 
GGGCTCGAGTTCTTGGTACCTTGGTTAAATC, reverse, GGGGCGGCCGCC 


CCAACTCATCTCGAATTTCAC. For mutagenesis experiment, the two ‘ATTTA’ 
motifs of Affl 3’ UTR were mutated into “TGGGC’. Sequences from mouse Mafk 
3'UTR (positions 1730-2825, RefSeq NM_010757) that are 1,096 bp in length con- 
taining 5 ‘“ATTTA’ motifs were PCR-amplified from mouse genomic DNA and 
cloned into psiCHECK2. Following are the primers for PCR amplification: forward, 
GGGGTTTAAACGAGCTCTGGGGCCACTGGAG, reverse, GGGGCGGCCGC 
CATCCCAAACAGGAAATTC. Sequences from mouse Nfe2l1 3'UTR (positions 
3736-4614, RefSeq NM_008686) that are 879 bp in length containing 1 “ATTTA’ 
motif were PCR amplified from mouse genomic DNA and cloned into psi\CHECK2. 
Following are the primers for PCR amplification: forward, GGGGTTTAAACG 
CTTCCTCTGCAGGGTCTAAAG, reverse, GGGGCGGCCGCGTCATGTGCTC 
ACAGCATTTC. ZFP36L2 overexpression construct was made by inserting ORF of 
Zfp3612 into the MICD4 vector. Zfp3612 enhancer regions (chr17:84500031- 
84500271; chr17:84515492-845 16478; chr17:84518538-845 19282; chr17:84585041- 
84585584; and chr17:84595103-84595494) were each PCR-amplified from mouse 
genomic DNA and cloned into the luciferase reporter vector pGL3-Basic. XZ- 
ZFP36L2-GFP construct was made by inserting ORF of Zfp36/2 without stop codon 
followed in frame with ORF of GFP into BglII and Ncol sites of XZ vector. XZ- 
ZFP36L2-IRES-GFP construct was made by inserting ORF of Zfp36l2 into BglII and 
EcoRI sites of XZ vector. 
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Obesity-induced gut microbial metabolite promotes 
liver cancer through senescence secretome 
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Obesity has become more prevalent in most developed countries 
over the past few decades, and is increasingly recognized as a major 
risk factor for several common types of cancer’. As the worldwide 
obesity epidemic has shown no signs of abating’, better understanding 
of the mechanisms underlying obesity-associated cancer is urgently 
needed. Although several events were proposed to be involved in 
obesity-associated cancer’, the exact molecular mechanisms that 
integrate these events have remained largely unclear. Here we show 
that senescence-associated secretory phenotype (SASP)** has crucial 
roles in promoting obesity-associated hepatocellular carcinoma 
(HCC) development in mice. Dietary or genetic obesity induces 
alterations of gut microbiota, thereby increasing the levels of deoxy- 
cholic acid (DCA), a gut bacterial metabolite known to cause DNA 
damage’. The enterohepatic circulation of DCA provokes SASP 
phenotype in hepatic stellate cells (HSCs)’, which in turn secretes 
various inflammatory and tumour-promoting factors in the liver, 
thus facilitating HCC development in mice after exposure to chem- 
ical carcinogen. Notably, blocking DCA production or reducing gut 
bacteria efficiently prevents HCC development in obese mice. 
Similar results were also observed in mice lacking an SASP inducer* 
or depleted of senescent HSCs, indicating that the DCA-SASP axis 
in HSCs has key roles in obesity-associated HCC development. 
Moreover, signs of SASP were also observed in the HSCs in the area 
of HCC arising in patients with non-alcoholic steatohepatitis’, indi- 
cating that a similar pathway may contribute to at least certain 
aspects of obesity-associated HCC development in humans as well. 
These findings provide valuable new insights into the development 
of obesity-associated cancer and open up new possibilities for its 
control. 

Cellular senescence is a process occurring in normal cells in response 
to telomere erosion or oncogene activation, acting through checkpoint 
activation and stable cell-cycle arrest as a barrier to tumorigenesis”"®. 
Recent studies, however, reveal that senescent cells also develop a secret- 
ory profile composed mainly of inflammatory cytokines, chemokines 
and proteases, a typical signature termed the senescence-associated 
secretory phenotype (SASP)* or the senescence messaging secretome 
(SMS), hereafter referred to as SASP. Some of the SASP factors have 
cell-autonomous activities that reinforce senescence cell-cycle arrest* 
and/or promote clearance of senescent cells''’”, but other SASP factors 
have cell non-autonomous functions associated with inflammation and 
tumorigenesis promotion’, indicating that SASP contributes positively 
and negatively to cancer development, depending on the biological 
context*’. Because some of the SASP factors, such as IL-6 and PAI-1*”, 
are known to increase cancer risk in obesity'”*, we propose that SASP 
may contribute to obesity-associated cancer. 

To explore this possibility, we first set up a system to examine the 
impact of dietary obesity on tumorigenesis, using wild-type C57BL/6 


mice. However, we were unable to detect a statistically significant 
difference in cancer development between obese mice fed a high-fat 
diet (HFD) and lean mice fed a normal diet (data not shown), implying 
that a certain level of oncogenic stimuli might be required for obesity- 
associated cancer, especially in wild-type mice maintained in a spe- 
cific pathogen free (SPF) environment. Because the Ras-pathway is 
frequently activated in human cancers, including hepatocellular car- 
cinoma (HCC)"™, we decided to use a treatment with DMBA (7,12- 
dimethylbenz(a)anthracene, a chemical carcinogen that causes an 
oncogenic Ras mutation) at the neonatal stage, a protocol known to 
generate a variety of tumours throughout the body”*. In this setting, we 
also took advantage of using p21-p-luc mice, in which the expression of 
the p21“?! gene (a senescence inducer, also known as Cdkn1a) 
can be monitored noninvasively using a bioluminescence imaging 
(BLI) technique’’. The neonatal p21-p-luc mice were therefore treated 
with a single application of DMBA, followed by feeding either HFD 
or normal diet for 30 weeks (Fig. 1a). Interestingly, a marked increase 
of the bioluminescent signal was observed in the abdomen of the 
obese mice, and it originated mainly from liver cancer (Supplemen- 
tary Fig. 1). Notably, all HFD-fed mice developed HCC, whereas only 
5% of -mice fed normal diet developed malignant tumours in lung, but 
not liver (Fig. 1b-e and Supplementary Fig. 2). Importantly, moreover, 
similar HCC development was also observed when genetically obese 
(ob/ob, also known as ep) mice were treated with DMBA at the 
neonatal stage (Supplementary Fig. 3a—d), indicating that obesity, but 
not the HFD, promotes HCC development. 

Because the induction of p21'%”“?! expression was observed 
in liver, particularly in the area of liver cancer (Supplementary 
Fig. 1c), we speculated that senescent cells might be present in the 
vicinity of cancerous hepatocytes. Indeed, p21*"/“P! expression 
was observed only in activated hepatic stellate cells (HSCs), which 
express &-smooth muscle actin (a-SMA) and desmin’ (Fig. 1f). 
Furthermore, a number of other senescence markers”’®, such as 
p16'“* expression, signs of DNA damage (53BP1 foci and YH2AX 
foci) and inhibited cell proliferation (the absence of bromodeoxyur- 
idine incorporation and Ki-67 expression), were also observed in acti- 
vated HSCs despite absence of oncogenic ras mutation (Fig. 1f and 
Supplementary Figs 3e and 4). Interestingly, moreover, increased 
expression of IL-6, Gro-a«, CXCL9 (major components of SASP)*°, 
but not HGF (a differentiation marker)’, was observed in activated 
HSCs, but not in other types of liver cells (Fig. 1f and Supplementary 
Figs 3e and 5), indicating that these activated HSCs are senescing and 
may promote obesity-associated HCC development via SASP. It 
should be noted that, unlike the study using carbon tetrachloride 
(CCl,)", fibrosis was not apparent in HFD-fed mice (Supplementary 
Fig. 6), precluding the possibility that the appearance of senescent 
HSCs was a by-product of liver fibrosis. 
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Figure 1 | Cellular senescence in HSCs. a, Timeline of the experimental 
procedure (n = 19 per group). Eut, euthanasia; ND, normal diet. 

b, Representative macroscopic photographs of livers. Arrowheads indicate 
HCCs. ¢, The ratios of cancer formation. d, The average liver tumour numbers 
and their relative size distribution. e, The average body weights at the age of 
30 weeks. f, Immunofluorescence analysis of liver section. HSCs were visualized 
by o-SMA staining and DNA was stained by 4’ ,6-diamidino-2-phenylindole 
(DAPI). Scale bars, 2.5 um. Arrowheads indicate «-SMA expressing cells that 
were positive for indicated markers. The histograms indicate the percentages of 
a-SMA-expressing cells that were positive for indicated markers. At least 100 
cells were scored per group. For all graphs, error bars indicate mean + standard 
deviation (s.d.). **P< 0.01. 


To ascertain the role of SASP in obesity-associated HCC develop- 
ment, we next sought evidence that the blockage of SASP can reduce 
obesity-associated HCC development. Although we were unable to 
detect the expression of IL-1u (an upstream regulator of SASP induc- 
tion)* in HSCs, significant induction of IL-1B (a functional homo- 
logue of IL-1%) and its activator, caspase-1 (an essential component 
of the inflammasome), was observed in senescent HSCs (Fig. 2a-c). 
Moreover, the addition of recombinant IL-1 caused the dose-dependent 
induction of I/-6 and Gro-« (also known as Cxcl1) gene expression in 
cultured primary murine HSCs (Supplementary Fig. 7a), indicating that 
inflammasome activation and subsequent IL-1f maturation can act as 
an upstream regulator of SASP induction in HSCs. Indeed, the levels of 
SASP factor expression in activated HSCs were substantially dimi- 
nished in mice lacking the II-1f gene (II-18~'~ mice, also known as 
a) (Fig. 2c), and the numbers and sizes of the liver tumours that 
developed in Il-18~'~ mice were strikingly reduced, as compared with 
wild-type mice (Fig. 2d, e), although the degree of steatohepatitis was 
not attenuated (Supplementary Fig. 8a, b). It should be noted, however, 
that other senescence markers, such as 53BP1 foci, pie pt expres- 
sion and inhibited cell proliferation, were still observed in the activated 
HSCs of II-1B~'~ mice (Fig. 2c and Supplementary Fig. 8c). These 
results are somewhat consistent with a recent observation that the 
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Figure 2 | IL-1f deficiency alleviates obesity-induced HCC development. 
a, Timeline of the experimental procedure (wild type (WT), n = 19; 

Il-1p ‘~,n=9). b, The average body weights at the age of 30 weeks. 

c, Immunofluorescence analysis of liver sections. HSCs were visualized by 
a-SMA staining and DNA was stained by DAPI. Scale bars, 2.5 jm. The 
histograms indicate the percentages of «-SMA-expressing cells that were 
positive for indicated markers. At least 100 cells were scored per group. 

d, Representative macroscopic photographs of livers. Arrowheads indicate 
HCCs. e, The average liver tumour numbers and their relative size distribution. 
For all graphs, error bars indicate mean = s.d. **P < 0.01. 


expression of p21"! can induce senescence cell-cycle arrest with- 


out SASP induction”, suggesting that SASP, but not senescence cell- 
cycle arrest, promotes obesity-associated HCC development. 

To further verify this idea, we next attempted to deplete senescent 
HSCs from obese wild-type mice treated with DMBA at the neonatal 
stage. As reported previously’*, an intravenous injection of liposomes 
carrying small interfering RNA (siRNA) against HSP47 substantially 
reduced the abundance of activated HSCs, coinciding with a signifi- 
cant reduction of HCC development (Supplementary Fig. 9a-f). Note 
that this was not accompanied by an attenuation of steatohepatitis 
(Supplementary Fig. 9g, h). These results, along with the data from 
Il-1B'~ mice (Fig. 2), strongly indicate that senescent HSCs have 
enhancing roles in HCC development via SASP, at least in the neonatal 
DMBA plus obesity-induced HCC model. It is also noteworthy that 
neither the deletion of the I/-1f gene nor the depletion of senescent 
HSCs caused appreciable weight loss (Fig. 2b and Supplementary Fig. 
9c), implying that there may be an indirect link between obesity and 
HCC development, at least in this experimental setting. These obser- 
vations then raised the question of how obesity provokes SASP in 
HSCs. 

Emerging evidence has indicated that alterations of intestinal 
microbiota are associated with obesity’’. Furthermore, the activation 
of toll-like receptor (TLR) 4 by lipopolysaccharide (LPS) from intest- 
inal Gram-negative bacteria has been shown to promote HCC 
development, in an HCC model using DEN (diethyl nitrosamine) plus 
CCl, treatment”. We thus explored the possibility that intestinal bac- 
teria have key roles in obesity-associated HCC development. Indeed, a 
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treatment with a well-established oral antibiotic cocktail (4Abx), 
which reduces the number of commensal intestinal bacteria’, caused 
a marked reduction of HCC development, accompanied by a marked 
decrease in senescent HSCs in the neonatal DMBA plus obesity- 
induced HCC model (Fig. 3 and Supplementary Fig. 3). As reported”®, 
4Abx treatment resulted in not only a > 99.5% reduction of the pres- 
ence of bacterial 16S ribosomal RNA gene in faeces, but also an 
enlargement of caecum commonly observed in germ-free mice 
(Fig. 3b and data not shown). Unexpectedly, however, a slight increase, 
rather than decrease, in HCC development was observed in mice 
lacking the Tir4 gene (Tir4-‘~) (Supplementary Fig. 10), indicating 
that LPS from Gram-negative bacteria is unlikely to promote HCC 
development in this setting. Indeed, meta 16S rRNA gene sequencing 
analysis of the intestinal microbiota revealed that the percentage of 
Gram-positive bacterial strains indigenous to the human and rodent 
intestinal tracts’ was dramatically increased with a HFD (Fig. 4a). 
Moreover, a treatment with vancomycin (VCM), an antibiotic that 
preferentially targets Gram-positive bacteria, alone was sufficient to 
block HCC development and the appearance of senescent HSCs 
(Figs 3d-f, 4a and Supplementary Fig. 3). These results lead us to 
propose that the obesity-associated increase of Gram-positive bacteria 
may promote HCC development, presumably through the enterohe- 
patic circulation of gut bacterial metabolites or toxins. 

To substantiate this idea, the serum metabolites of HFD- and normal- 
diet-fed mice were analysed by liquid chromatography mass spectro- 
metry (LC-MS). Interestingly, the level of deoxycholic acid (DCA), a 
secondary bile acid produced solely by the 7a-dehydroxylation of 
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Figure 3 | Antibiotics treatments alleviate obesity-induced HCC 
development. a, Timeline of the experimental procedure (HFD, n = 19; 
HFD+4Abx, nm = 12; HFD+VCM, n = 6). b, The copy number of intestinal 
bacteria in faeces of indicated mice. c, The average body weights at the age of 
30 weeks. d, Representative macroscopic photograph of livers. Arrowheads 
indicate HCCs. e, The average tumour numbers and their relative size 
distribution. f, Immunofluorescence analysis of liver sections. HSCs were 
visualized by a-SMA staining and DNA was stained by DAPI. Scale bars, 

2.5 um. The histograms indicate the percentages of «-SMA expressing cells that 
were positive for indicated markers. At least 200 cells were scored per group. 
For all graphs, error bars indicate mean + s.d. **P < 0.01. 
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Figure 4 | Bacterial metabolite promotes obesity-induced HCC 
development. a, The relative abundance of OTUs (%) in the faecal bacterial 
community. Data are representative of five mice per group. b, Serum DCA 
concentration (ND, n = 4; HFD, n = 6; HFD+VCM, n = 3; HFD+DFAIII, 
n= 3; HFD+ UDCA, n = 3; ob/ob, n = 3; ob/ob+4Abx, n = 3). Error bars 
indicate mean + s.e.m. c, Timeline of the experimental procedure (n = 3 per 
group). d, Representative macroscopic photographs of livers. Arrowheads 
indicate HCCs. e, The average tumour numbers and their relative size 
distribution. f, The average body weight and serum DCA concentration. 

g, Immunofluorescence analysis of liver sections. Scale bars, 2.5 um. The 
histograms indicate the percentages of «-SMA-expressing cells that were 
positive for indicated markers. At least 100 cells were scored per group. h, The 
quantitative real time PCR (qPCR) analysis of baiJ gene in the faeces (180 mg) 
of indicated mice used in a. For all graphs except b, error bars indicate 

mean + s.d. *P< 0.05, **P<0.01. 


primary bile acids carried out by gut bacteria such as strains belonging 
to Clostridium cluster XI and XIVa*° (VCM-sensitive Gram-positive bac- 
teria), was substantially increased by the HFD feeding, and was reduced 
by antibiotic treatments (Figs 3a and 4b). Note that DCA is known to 
cause DNA damage through reactive oxygen species production”! and 
DNA damage is a critical inducer of SASP***. Moreover, in addition 
to colon carcinogenesis*?, DCA has been shown to enhance liver 
carcinogenesis. These notions prompted us to examine if DCA has 
key roles in obesity-associated HCC development. To this end, we 
attempted to lower the levels of DCA, by either decreasing the 7c- 
dehydroxylation activity with difructose anhydride III (DFA III)” or 
stimulating bile acid secretion with ursodeoxycholic acid (UDCA)”*. 
Notably, lowering the DCA concentration substantially reduced HCC 
development, accompanied by a marked decrease in senescent HSCs in 
obese mice treated with DMBA at the neonatal stage (Fig. 4b and 
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Supplementary Figs 11 and 12). In a reciprocal set of experiments, we 
also assessed whether DCA-feeding enhances HCC development in 
mice treated with DMBA at the neonatal stage (Fig. 4c). Intriguingly, 
although DCA feeding alone was insufficient to enhance HCC 
development in lean mice fed a normal diet at 30 weeks (data not 
shown), a significant enhancement of HCC development (Fig. 4c-f), 
accompanied by the appearance of senescence cell-cycle arrest and 
SASP in HSCs (Fig. 4g), was observed when HFD-fed mice treated 
with 4Abx were fed DCA for 17 weeks. 

Notably, operational taxonomic unit (OTU)-based bacterial diver- 
sity analysis (Fig. 4a), in conjunction with a quantitative PCR analysis 
(Supplementary Fig. 13), revealed that the population of cluster XI of 
the genus Clostridium was strikingly increased in HFD-fed mice. 
Interestingly, phylogenetic analysis of the bacterial OTUs revealed 
that the population of Clostridium cluster XI is composed of a single 
bacterial taxon (OTU-1105) close to the DCA-producing strain 
Clostridium sordellii, and represents more than 12% of the faecal bac- 
teria in HFD-fed mice (Supplementary Fig. 14). Concordantly, more- 
over, the abundance of the baiJ gene, a gene involved in bile acid 
7a-dehydroxylation’’, was remarkably increased in faeces of mice 
fed HFD and was reduced by VCM treatment (Fig. 4h). On the other 
hand, a bacterial taxon (OTU-154) close to other DCA producing 
strains belonging to Clostridium cluster XIVa (Clostridium hylemonae 
and Clostridium scindens) represents only 0.5% of the total faecal bac- 
teria in HFD-fed mice (Supplementary Fig. 14). Thus, although other 
bacteria may also be involved here, the simplest explanation for our 
data is that OTU-1105 belonging to Clostridium cluster XI contribute 
to an increase in the DCA level at least to some extent in HFD-fed mice. 

Finally, to further support and extend our murine data to human 
biology, we tested whether IL-1 treatment can induce SASP in cul- 
tured primary human HSCs. As in murine HSCs, the addition of 
recombinant IL-1 caused the induction of II-6 and JI-8 (a functional 
homologue of murine Gro-«) gene expression in cultured primary 
human HSCs (Supplementary Fig. 7b). Importantly, moreover, signs 
of cellular senescence and SASP were also observed in the HSCs with- 
out serious fibrosis in the area of HCC arising in patients with non- 
alcoholic steatohepatitis (NASH)’ (8 out of 26) (Supplementary Fig. 
15). This is somewhat consistent with previous observations that repli- 
cative senescence of cultured human HSCs is accompanied by a pro- 
nounced inflammatory but less fibrogenic phenotype** and a certain 
percentage of NASH- associated HCC arose from the non-cirrhotic 
liver?’. Unlike rodents, the human liver cannot 7-hydroxylate DCA, 
forming cholic acid®. Hence, DCA can accumulate to very high levels 
(>50%) in the bile acid pool of humans’. These data, together with 
the previous observation that high fat consumption resulted in higher 
faecal DCA concentrations in healthy male volunteers (ages 20-60)”°, 
suggest that DCA-induced senescent HSCs may contribute to at least 
certain aspects of obesity-associated HCC development via SASP in 
humans as well. 

It should be noted that although many of the perturbations, for 
example, the J/-1$ knockout, antibiotics treatment and lower DCA 
levels, significantly prevent HCC development, residual HCCs were 
still observed with these perturbations (Figs 2e and 3e and 
Supplementary Figs 11c and 12c). These results, in conjunction with 
the observation that DCA-feeding alone was insufficient to enhance 
HCC development in lean mice fed a normal diet until at least 30 weeks 
(data not shown), imply that an additional factor associated with 
obesity may exist to promote obesity-associated HCC development. 
Nevertheless, combining published data’*'*”?”**° with our findings, it 
is clear that the increased levels of DCA produced by gut bacteria play 
key roles in the promotion of obesity-associated HCC development via 
provoking SASP in HSCs, at least in the neonatal DMBA plus obesity- 
induced HCC model (Supplementary Fig. 16). A greater understand- 
ing of the molecular mechanisms linking gut microbial metabolite to 
SASP will therefore provide valuable new insights into how to bypass 
this undesirable side effect of cellular senescence. 
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METHODS SUMMARY 


Chemically-induced carcinogenesis. DMBA treatments'* consisted of a single 
application of 50 ll of a solution 0.5% DMBA (7,12-dimethylbenz [a]anthracene, 
Sigma) in acetone to the dorsal surface on postnatal day 4-5. Mother mice with 
pups were then fed either normal diet or HFD until weaning. At the age of 4 weeks 
old, pups were weaned and continuously fed either normal diet or HFD until 
euthanized. 

Bacterial 16S rRNA amplicon sequencing and analysis. Bacterial genomic DNA 
was isolated from mice faeces, amplified for V1-V4 hypervariable regions of the 
16S rRNA gene, and used for pyrosequencing analysis. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 

Mice and diet. The p21-p-luc mice (CD1)'* were backcrossed with C57BL/6 mice 
for eight generations. The leptin-deficient (ob/ob) mice (C57BL/6) were purchased 
from Charles River Laboratories Japan, Inc. Tlr4/~ mice (C57BL/6) were pur- 
chased from Oriental Bioservices. II-1f‘~ mice (C57BL/6) were provided by Y. 
Iwakura*’. Male mice were used for all the experiments in this study. The mice 
were maintained under specific pathogen-free (SPF) conditions, on a 12-h light- 
dark cycle, and fed normal diet (CE-2 from CLEA Japan Inc., composed of 
12 kcal% fat, 29 kcal% protein, 59 kcal% carbohydrates) or high-fat diet (HED, 
D12492 from Research Diets Inc., composed of 60 kcal% fat, 20 kcal% protein, 
20 kcal% carbohydrates) ad libitum. Mice with more than 45 g weight at the age of 
30 weeks old were used as obese mice for all the experiments. We measured the 
amount of food our mice eat and found that a 50 g HFD mouse eats 3.44 g food a 
day. This equates to 1.2 g of fat per day or 24g per kg. According to the Reagan- 
Shaw equation® (human equivalent dose (mg kg") = mouse dose (mg kg~') x 
mouse K,,, factor + human K,, factor; where the mouse and human K,, factors are 
3 and 37, respectively), this is equivalent to a 70 kg human eating 136 g of fat a day. 
The sample size used in this study was determined based on the expense of data 
collection, and the need to have sufficient statistical power. Randomization and 
blinding were not used in this study. All animal experiments were cared for by 
protocols approved by the Committee for the Use and Care of Experimental 
Animals of the Japanese Foundation for Cancer Research (JFCR). 

Chemically induced carcinogenesis. DMBA treatments'* consisted of a single 
application of 50 ul of a solution 0.5% DMBA (7,12-dimethylbenz(a)anthracene, 
Sigma) in acetone to the dorsal surface on postnatal day 4-5. After this application, 
mother mice with pups were fed normal diet or HFD. At the age of 4 weeks old, 
pups were weaned and continuously fed either normal diet or HFD until eutha- 
nized. Evaluation of tumour number and size was determined by counting the 
number of visible tumours and measuring the size of the tumour. 
Bioluminescence imaging. Bioluminescence imaging was performed as previ- 
ously described’***. In brief, for the detection of luciferase expression, mice were 
anesthetized, injected intraperitoneally with p-luciferin sodium salt (75 mg kg” ') 
5 min before beginning photon recording. Mice were placed in the light-tight 
chamber and a grey-scale image of the mice was first recorded with dimmed light 
followed by acquisition of luminescence image using a cooled charged-coupled 
device (CCD) camera (PIXIS 1024B; Princeton Instruments). The signal-to-noise 
ratio was increased by 2 X 2 binning and 5 min exposure. For colocalization of the 
luminescent photon emission on the animal body, grey scale and pseudo-colour 
images were merged by using IMAGE-PRO PLUS (Media Cybernetics). 
Antibiotics treatment. Antibiotics treatment was performed as previously 
described” using a combination of four antibiotics (4Abx) of ampicillin (1 gl’), 
neomycin (1 gl), metronidazole (1 gl') and vancomycin (500 mg’), or van- 
comycin (500 mg’) alone (VCM) in drinking water at the age of 13 weeks old 
until killed. 

Histology and immunofluorescence analysis. Haematoxylin and eosin staining 
and immunofluorescence analysis were performed as previously described'*. The 
primary antibodies used for mouse samples were as follows: «-SMA (Sigma 
A5228), desmin (abcam ab15200), p21 (abcam ab2961), pl6 (Santa Cruz 
scl207), 53BP1 (Santa Cruz sc22760), y-H2AX (CST 9718), IL-6 (abcam 
ab6672), Gro-« (abcam ab17882), Ki-67 (Thermo RM9106), bromodeoxyuridine 
(abcam ab6326), caspase-1 (Millipore 06-503), IL-1B (R&D systems AF-401-NA), 
HSP47 (Santa Cruz sc8352), CXCL9 (abcam ab137792), F4/80 (Invitrogen BM8) 
and CD45(Millipore 05-1416). The primary antibodies used for human samples 
were as follows: o-SMA (Dako M0851), y-H2AX (CST 9718), p16 (Santa Cruz 
sc56330), p21 (CST #2947), IL-6 (abcam ab6672), IL-8 (abcam ab18672), 53BP1 
(Santa Cruz sc22760) and caspase-1 (Millipore 06-503). 

Quantitative PCR. Total RNA was extracted from mouse tissues using TRIzol 
reagent (Life technologies) and reverse transcription and quantitative PCR were 
performed as previously described”. Primers used were as follows: human GAPDH, 
5'-CAACTACATGGTTTACATGTTC-3’ (forward) and 5’-GCCAGTGGACT 
CCACGAC-3’ (reverse), mouse Gapdh, 5'-CAACTACATGGTCTACATGTTC- 
3’ (forward) and 5’-CACCAGTAGACTCCACGAC-3’ (reverse), human IL-6, 5’- 
CTCGACGGCATCTCAGCCCTGA-3" (forward) and 5’-CTGCCAGTGCCTC 
TTTGCTGCTTT-3’ (reverse), mouse II-6, 5'-TGATTGTATGAACAACGATG 
ATGC-3' (forward) and 5’-GGACTCTGGCTTTGTCTTTCTTGT-3’ (reverse), 
human IL-8, 5'-AAGGAAAACTGGGTGCAGAG-3’ (forward) and 5’-ATTGC 
ATCTGGCAACCCTAC-3’ (reverse), mouse Gro-a, 5’-GCTGGGATTCACC 
TCAAGAA-3’ (forward) and 5'‘-AGGTGCCATCAGAGCAGTCT-3’ (reverse), 
bacterial baiJ 5'-TCAGGACGTGGAGGCGATCCA-3’ (forward) and 5’- 
TACRTGATACTGGTAGCTCCA-3’ (reverse), Clostridium cluster XI 16S 
rRNA gene 5'-TGACGGTACYYNRKGAGGAAGCC-3’ (forward) and 5'-ACT 
ACGGTTRAGCCGTAGCCTTT-3’ (reverse). 


In vivo RNAi experiment. 250 ull of siRNA solution (3 mg ml’) against HSP47 
or control siRNA were mixed with 250 pl of complexation buffer and 500 ul of 
Invivofectamine (Life Technologies), incubated for 30 min at 50°C, and dialysed 
at room temperature for 2h in 11 of PBS (pH7.4). Dialysed siRNA- 
Invivofectamine complex was collected and 3g per g (weight) was injected 
through mice’s tail vein twice a week for 15 weeks until killed. The sequences of 
HSP47 targeting oligo are as follows. 5’-GCACUGCUUGUGAACGCCAU 
GUUCU-3’ (sense), 5’-AGAACAUGGCGUUCACAAGCAGUGC-3’ (antisense). 
As a negative control, Ambion In vivo Negative Control #1 siRNA(4457289) was 
used. 

Treatment with DCA, UDCA and DFAIII. Deoxycholic acid (DCA) was dis- 
solved in absolute ethanol and diluted in 66% propylene glycol to reduce the 
concentration of alcohol to 5%. HFD-fed mice treated with DMBA at neonatal 
stage were fed a combination of four antibiotics (4Abx) with 40 1g per g (weight) of 
DCA or vehicle (control) three times per week using a plastic feeding tube at the 
age of 13 weeks old until killed. Ursodeoxycholic acid (UDCA) tablets (Tanabe- 
Mitsubishi Pharma) were powdered and dissolved in 66% propylene glycol. HFD- 
fed mice treated with DMBA at neonatal stage were fed 60 ig per g (weight) of 
UDCA or vehicle (control) using a plastic feeding tube every day at the age of 
15 weeks old until killed. Difructose anhydride III (DFAIII)was dissolved in saline. 
HFD-fed mice treated with DMBA at neonatal stage were fed 0.1 mg per g (weight) 
of DFAIII or vehicle (control) using a plastic feeding tube every day at the age of 
17 weeks old until killed. 

Bacterial 16S rRNA amplicon sequencing and analysis. Bacterial genomic DNA 
was isolated from faeces using a QIAamp DNA Stool mini kit (QIAGEN), and 
100 ng of DNA was used for PCR for V1-V4 hyper variable regions of the 16S 
rRNA gene. Twenty five cycles of amplification was performed with universal 16S 
rRNA primers 27F 5'-AGAGTTTGATCCTGGCTCAG-3’ and 519R_ 5’- 
GWATTACCGCGGCKGCTG-3’ with 10-bp barcode tags using KOD Fx plus 
DNA polymerase (TOYOBO). All amplicons were sequenced on a 454 Genome 
Sequencer FLX Titanium platform (Roche Diagnostics and Beckman Coulter 
Genomics). Quality filter-passed sequence reads were obtained by removing reads 
that had no both primer sequences, had less than 500 bp in length, had the average 
quality value (QV) < 25, or were possible chimaeric. Of the filter-passed reads, 
more than 2,500 sequence reads trimming off both primer sequences for each 
sample were used and subjected to OTU analysis with the cutoff similarity of 97% 
identity using QIIME software. Representative sequences from each OTU were 
blasted to the database in Ribosomal Database Project (RDP) and aligned. The 
obtained OTU sequences were grouped at class level***’ and were subjected to 
phylogenetic analysis using MEGA software as described previously”. 
Determination of the copy number of faecal bacteria. The copy number of 
faecal bacteria was calculated from the standard curve of known bacterial copy 
number by quantitative real-time PCR of 16S rRNA gene using 341f, 5’- 
CCTACGGGAGGC AGCAG-3’ and 534r 5'-ATTACCGCGGCTGCTGG-3’ 
primers as described previously”®. 

Measurement of serum ALT and AST. The levels of serum alanine aminotrans- 
ferase (ALT) and aspartate aminotransferase (AST) were measured using kits from 
WAKO Pure Chemical Industries, Ltd. 

Measurement of serum deoxycholic acid. The metabolomic analysis of mice 
serum were performed by liquid chromatograph mass spectrometry(LC-MS) in 
Human Metabolome Technologies Inc. Japan as previously described”. The 
amount of serum DCA was measured by gas chromatograph mass spectrometry 
(GC-MS) in the Bile Acid Institute of Junshin Clinic, Japan as described™. 
Human subjects. Informed consent was obtained from all patients according to 
the protocol approved by the ethics committee of the Japanese Foundation for 
Cancer Research (JFCR). 

Statistical analysis. Data were analysed by unpaired t-test with Welch correction 
(two-side) or Mann-Whitney test (two-side). P- values less than 0.05 were con- 
sidered significant. 

Cell culture. Murine primary HSCs were isolated as previously described'*”’, and 
were cultured in Dulbecco’s modified Eagle’s medium supplemented with 10% 
fetal bovine serum in 3% O, and 5% CO, condition. Human primary HSCs were 
purchased from Health Science Research Resources Bank and were grown in 
Dulbecco’s modified Eagle’s medium supplemented with 10% fetal bovine serum 
in 3% O2 and 5% CO, condition. 

H-ras sequencing. Total RNA was prepared from HCCs and HSCs isolated from 
tumour regions using TRIzol reagent (Invitrogen). RNA was converted to cDNA 
by using oligo (dT) primer and a 330-bp PCR fragment containing exon 2 of H-ras 
gene was amplified with 5’-TGGGGCAGGAGCTCCTGGAT-3’ and 5’-GAA 
GGACTTGGTGTTGTTGA-3’ primers. PCR fragments were sub-cloned using 
Target Clone Plus system (TOYOBO) and were sequenced by using Dye- 
Terminator and Big-Dye cycle sequencing system (Applied Biosystems) as 
described previously”. 
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Self-assembling influenza nanoparticle vaccines 
elicit broadly neutralizing H1N1 antibodies 
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Srinivas S. Rao’, Wing-Pui Kong’, Lingshu Wang! & Gary J. Nabel!t 


Influenza viruses pose a significant threat to the public and are a 
burden on global health systems’”. Each year, influenza vaccines 
must be rapidly produced to match circulating viruses, a process 
constrained by dated technology and vulnerable to unexpected 
strains emerging from humans and animal reservoirs. Here we use 
knowledge of protein structure to design self-assembling nanopar- 
ticles that elicit broader and more potent immunity than traditional 
influenza vaccines. The viral haemagglutinin was genetically fused to 
ferritin, a protein that naturally forms nanoparticles composed of 24 
identical polypeptides’. Haemagglutinin was inserted at the inter- 
face of adjacent subunits so that it spontaneously assembled and 
generated eight trimeric viral spikes on its surface. Immunization 
with this influenza nanoparticle vaccine elicited haemagglutination 
inhibition antibody titres more than tenfold higher than those from 
the licensed inactivated vaccine. Furthermore, it elicited neutrali- 
zing antibodies to two highly conserved vulnerable haemagglutinin 
structures that are targets of universal vaccines: the stem** and the 
receptor binding site on the head®’. Antibodies elicited by a 1999 
haemagglutinin-nanoparticle vaccine neutralized H1N1 viruses 
from 1934 to 2007 and protected ferrets from an unmatched 2007 
HINI1 virus challenge. This structure-based, self-assembling syn- 
thetic nanoparticle vaccine improves the potency and breadth of 
influenza virus immunity, and it provides a foundation for building 
broader vaccine protection against emerging influenza viruses and 
other pathogens. 

Influenza outbreaks arise from viruses that evade human immunity. 
Advances in influenza virus structural biology, nanotechnology and 
gene delivery offer new opportunities to develop improved vaccines 
that can confer more broadly protective immunity against diverse 
influenza viruses*®*’. Among recent innovations, several natural 
proteins have shown the ability to form nanoparticles well-suited for 
antigen presentation and immune stimulation’®. One such protein is 
ferritin, a ubiquitous iron storage protein that self-assembles into 
nanoparticles*. Although ferritin has been used to display exogenous 
peptides"’, it has not been possible to display viral glycoproteins 
because of their complexity and requirements for oligomerization. 
Additionally, recombinant ferritins made in prokaryotic cells were 
not subjected to mammalian glycosylation and other post-translational 
modifications typical of viral proteins''"'*. Structural analysis of ferritin 
indicated that it would be possible to insert a heterologous protein, 
specifically influenza virus haemagglutinin (HA), so that it could 
assume the physiologically relevant trimeric viral spike (Fig. 1a). 
Ferritin forms a nearly spherical particle composed of 24 subunits 
arranged with octahedral symmetry around a hollow interior. The 
symmetry includes eight threefold axes on the surface. The aspartic 
acid (Asp) at residue 5 near the NH, terminus is readily solvent access- 
ible, and the distance between each Asp 5 on the threefold axis (28 A)is 
almost identical to the distance between the central axes of each HA2 
subunit of trimeric HA (Fig. 1a, right). We therefore proposed that HA 
would trimerize properly if inserted into this structure. 


To test this hypothesis, we genetically fused the ectodomain of 
A/New Caledonia/20/1999 (1999 NC) HA to Helicobacter pylori 
non-haem ferritin'* (Fig. 1a, bottom), a ferritin that diverges highly 
from its mammalian counterparts (Supplementary Fig. 1). This fusion 
protein was expressed in mammalian cells, and self-assembly of 
ferritin and HA-ferritin nanoparticles was confirmed by size exclusion 
chromatography and dynamic light scattering (Supplementary Fig. 2a, 
b). HA-ferritin also had the expected apparent molecular weight of 
85kDa (Supplementary Fig. 2c). Whereas ferritin alone formed 
smooth spherical particles as visualized by transmission electron 
microscopy (TEM), HA-ferritin showed clearly visible spikes protru- 
ding from the spherical core (Fig. 1b, Ferritin np versus HA-np). 
Remarkably, the placement of these spikes illustrated the octahedral 
symmetry of the HA-nanoparticle design. Octahedral two-, three- and 
fourfold axes were distinctly observed in the TEM image (Fig. 1b, 
right). These data demonstrated the formation of trimeric HA spikes 
on self-assembling nanoparticles. 

To verify the antigenicity of the HA spikes on the nanoparticles, we 
analysed their reactivity with an anti- HA head monoclonal antibody and 
a conformation-dependent monoclonal antibody, CR6261, which 
recognizes a conserved structure on the HA stem*. The HA-nanoparticle 
binds to the anti-head or the anti-stem monoclonal antibody with affini- 
ties similar to trimeric HA or trivalent inactivated influenza vaccine 
(TIV) containing the same 1999 NC HA at equimolar concentrations 
of HA (Supplementary Fig. 3a). Analogous to trimeric HA, the HA- 
nanoparticle also blocked neutralization by CR6261 and another stem- 
directed monoclonal single chain variable fragment antibody, F10 
(ref. 5) (Supplementary Fig. 3b). These results confirmed that HA mole- 
cules on the HA-nanoparticle antigenically resembled the physiological 
HA viral spike. 

To assess the immunogenicity of the HA-nanoparticle, mice were 
immunized twice with TIV or HA-nanoparticles with or without Ribi 
adjuvant. The HA-nanoparticles induced 1.6-fold higher haemagglu- 
tination inhibition (HAJ) titres than TIV in the absence of adjuvant. 
Although this increase did not reach statistical significance, HAI titres 
were 7.2-fold higher in animals receiving HA-nanoparticles when Ribi 
was used (Fig. 2a, left; P< 0.0001), and a similar effect was observed in 
the neutralization and enzyme-linked immunosorbent assay (ELISA) 
titres (Fig. 2a, middle and right; P< 0.0001). For example, neutraliza- 
tion titres elicited by HA-nanoparticles as assessed by the concentra- 
tion of antibody needed to inhibit viral entry by 90% (ICy9) were ~34 
times higher than TIV (Fig. 2a, middle). We also evaluated whether a 
similar immune response can be induced by HA-nanoparticles with 
MEF59, an adjuvant that has been used in humans”. Similarly high HAI 
and ICgo titres were observed in mice receiving MF59-adjuvanted HA- 
nanoparticles (4,608 + 512 and 47,140 + 22,561, respectively), and 
the titres were significantly higher than those induced by either non- 
adjuvanted HA-nanoparticles (P = 0.0005 and 0.0311 for HAI and 
ICgo, respectively) or MF59-adjuvanted TIV (P = 0.0016 and 0.0388 
for HAI and ICgo, respectively) (Fig. 2b). These results demonstrated 
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Figure 1 | Molecular design and characterization of ferritin nanoparticles 
displaying influenza virus HA. a, A subunit of H. pylori non-haem ferritin 
(PDB: 3bve) (left). The NH3- and COOH-termini are labelled as N and C, 
respectively. Three subunits surrounding a threefold axis are shown (middle) 
and the Asp 5 is coloured in red. An assembled ferritin nanoparticle and an HA 
trimer (PDB: 3sm5) (viewed from membrane proximal end) (right). A triangle 
connecting the Asp 5 residues at the threefold axis is shown in red. The same 


the feasibility of using the HA-nanoparticles with an adjuvant suitable 
for humans’®. Because higher titres were observed using either adjuvant, 
further comparisons were performed with adjuvant (Ribi). Neutralization 
against a panel of H1N1 strains revealed not only increased potency, but 


triangle is drawn on the HA trimer (right). A schematic representation of the 
HA-ferritin fusion protein is shown (bottom). b, Negatively stained TEM 
images of nanoparticles (np) (left and middle). Computational models and 
observed TEM image (right, top and bottom panels) representing octahedral 
two-, three- and fourfold axes of HA-nanoparticles are shown as indicated. 
Visible HA spikes are numbered in the images. 


also enhanced breadth, stimulated by HA-nanoparticles compared with 
TIV or trimeric HA (Fig. 2c). Neutralization against two unmatched, 
highly divergent H1N1 strains, A/Puerto Rico/8/1934 (1934 PR8) and 
A/Singapore/6/1986 (1986 Sing), was only observed in mice immunized 
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Figure 2 | Immune responses in HA-nanoparticle-immunized mice. a, HAI 
(left), ICy9 neutralization (middle) and anti-HA antibody endpoint titres (right) 
after two immunizations of TIV or HA-nanoparticles with or without (—) Ribi. 
b, HAI (left) and ICgp (right) titres after two immunizations of TIV with MF59 
or HA-nanoparticles with or without (—) MF59. Two of five mice immunized 
with MF59-adjuvanted HA-nanoparticles exhibited ICg9 titres >51,200 and 

were plotted as 51,200. The data are presented as box-and-whisker plots (boxed 
from lower to upper quartile with whiskers from minimum to maximum) with 
lines at the mean (n = 5). c, Neutralization breadth of the immune sera (with 
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Ribi). IC;o titres against a panel of H1N1 pseudotyped viruses were determined. 
Heat map is coloured in a gradient from green to yellow to red reflecting the 
neutralization strength. d, Cellular (left and middle) and humoral (right) 
immune responses against H. pylori and mouse ferritins. Cells expressing 
interferon y (IFN-y), tumour necrosis factor « (TNF-a) or interleukin 2 (IL-2) 
upon stimulation with peptides covering H. pylori or mouse ferritins were 
combined and plotted as cytokine*. The data are presented as box-and-whisker 
plots with lines at the mean (n = 5). ICS, intracellular cytokine staining. 
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with the HA-nanoparticles, and the titre against the contemporary strain 
A/Brisbane/59/2007 (2007 Bris) was more than tenfold higher in mice 
immunized with HA-nanoparticles than with TIV (Fig. 2c). 

We next examined whether pre-existing immunity to ferritin or to 
other HA subtypes would interfere with subsequent immunization 
using HA-nanoparticles. Mice pre-immunized with either H3 (A/ 
Perth/16/2009, 2009 Perth) HA-nanoparticles or empty ferritin 
nanoparticles generated substantial anti-H3 HA and/or anti-H. pylori 
ferritin antibody responses (Supplementary Fig. 4a). They were then 
immunized with H1 (1999 NC) HA-nanoparticles. Comparable HAI, 
ICo9 and ELISA titres against 1999 NC HA were observed in naive 
animals as well as in groups pre-immunized with H3 HA-nanoparticles 
or empty ferritin nanoparticles (Supplementary Fig. 4b). These results 
indicated that pre-existing anti-H. pylori ferritin immunity did not 
diminish the HA-specific antibody response. 

To address the concern that immunization with ferritin might 
abrogate immune tolerance and induce autoimmunity, we analysed 
T-cell and antibody responses against murine and H. pylori ferritins in 
HA-nanoparticle-immunized mice. Although we found an increase in 
intracellular cytokine staining of CD4* T cells stimulated with H. 
pylori ferritin peptides (Fig. 2d, left), no increase in CD4* or CD8* 
intracellular cytokine staining responses to murine ferritin peptides 
were observed (Fig. 2d, left and middle). In addition, antibodies to 
H. pylori ferritin, but not to murine ferritin, were detected in the immune 
sera (Fig. 2d, right). We therefore found no evidence of immunity to 
autologous ferritin in mice. Moreover, the HA-nanoparticles are 
unlikely to affect iron homeostasis in vivo because of their minimal iron 
incorporation activity (Supplementary Fig. 5). 

We next generated a trivalent vaccine comprising three, separately 
purified HA-nanoparticles formulated into a single vaccine dose, ana- 
logous to a standard TIV. The strains chosen were H1 (A/California/ 
04/2009, 2009 CA), H3 (2009 Perth) and influenza B (B/Florida/04/ 
2006, 2006 FL). All three HA-nanoparticles self-assembled and showed 
morphology similar to 1999 NC HA-nanoparticles (Supplementary 
Fig. 6a). The immunogenicity of multispecific HA-nanoparticles (com- 
bination of three monospecific HA-nanoparticles) was compared to a 
seasonal TIV containing the same H1 and H3 strains and a mismatched 
influenza B (B/Brisbane/60/2008). HAI titres against homologous 
HIN1 and H3N2 viruses were significantly increased in animals 
immunized with multispecific HA-nanoparticles relative to TIV (Sup- 
plementary Fig. 6b; P = 0.0125 and 0.0036, respectively). When com- 
pared to animals immunized with the corresponding monospecific 
HA-nanoparticles, HAI titres against HIN1 and H3N2 viruses induced 
by multispecific HA-nanoparticles were comparable (Supplementary 
Fig. 6b). Therefore no substantial antigenic competition was observed 
with the multispecific HA-nanoparticle vaccine. 

We next examined the immunogenicity of HA-nanoparticles (1999 
NC) in ferrets. Three weeks after the first immunization, all ferrets 
receiving Ribi-adjuvanted HA-nanoparticles generated protective 
HAI titres against homologous virus (>40), whereas only 50% (3/6) 
of Ribi-adjuvanted TIV-immunized ferrets induced titres greater than 
40 (Fig. 3a, left; P = 0.0056). The same difference was also observed for 
both neutralization and ELISA titres (Fig. 3a, middle and right; 
P= 0.0047 and P = 0.0045, respectively), documenting the potency 
of HA-nanoparticles in a second species. After boosting, the HAI, 
ICg9 and ELISA titres of the HA-nanoparticle-immune sera were 
approximately tenfold higher than those of TIV-immune sera (Fig. 3a, 
left, middle and right; 457+ 185 versus 5,760 + 1,541, P = 0.0066; 
598 + 229 versus 5,515 + 1,074, P= 0.0012; and 5,902 + 1,851 versus 
55,105 + 13,018, P = 0.0038, respectively). Remarkably, a single immu- 
nization of HA-nanoparticles induced immune responses comparable 
to two immunizations of TIV (Fig. 3a). 

To determine whether HA-nanoparticles could confer protection 
against an unmatched HIN virus, immunized ferrets were challenged 
with 2007 Bris virus, which had not yet evolved when 1999 NC circu- 
lated and required a different seasonal vaccine to confer protection in 
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Figure 3 | Protective immunity induced in ferrets immunized with the HA- 
nanoparticles. a, HAI (left), ICg9 (middle) and anti-HA antibody endpoint 
titres (right) against 1999 NC HA. Immune sera were collected after the first (1) 
and second (2) immunizations. The data are presented as box-and-whisker 
plots with lines at the mean (n = 6). b, Protection of immunized ferrets from 
2007 Bris virus challenge. Challenge was performed with 10°° EIDso through 
intranasal inoculation. Virus titres in the nasal washes were determined by 50% 
tissue culture infectious dose (TCIDs0) assay (left). The mean viral loads with 
s.d. at each time point were plotted (n = 6). Change in body weight after virus 
challenge was monitored (right). Each data point represents the mean percent 
change in body weight from day 0 (pre-challenge) with s.e.m. (n = 6). 


humans. Ferrets immunized with HA-nanoparticles showed a signifi- 
cant reduction in viral shedding beginning 1 day after challenge com- 
pared to the sham control group (Fig. 3b, left; P = 0.0259). At the same 
time point, no significant reduction in viral shedding was seen in the 
TIV-immunized group (Fig. 3b, left; P = 0.4665). In addition, HA- 
nanoparticle-immunized ferrets suffered less weight loss compared 
to the TIV-immunized and sham control animals (Fig. 3b, right), 
further demonstrating the protective efficacy of HA-nanoparticles. 
Interestingly, unlike the TIV-immune sera which preferentially 
neutralized homologous 1999 NC and another closely related strain 
(A/Beijing/262/1995, 1995 Beijing), sera from HA-nanoparticle- 
immunized ferrets broadly neutralized heterologous 1986 Sing, 1995 
Beijing, A/Solomon Islands/3/2006 (2006 SI) and 2007 Bris viruses 
(Fig. 4a, left). The recent identification of two classes of broadly neut- 
ralizing antibodies that target the highly conserved but vulnerable 
regions of HA suggests a potential pathway to develop influenza vaccines 
with broad coverage’. One class of broadly neutralizing antibodies typi- 
fied by CR6261 recognizes a hydrophobic groove on the HA stem and 
neutralizes virus by inhibiting membrane fusion*”"**". The second class 
recognizes the receptor binding site (RBS) on the HA head and inhibits 
viral entry®’**. To determine whether the cross-reactivity across diverse 
HINI1 strains induced by HA-nanoparticles included neutralizing 
antibodies to the HA stem epitope, ferret immune sera were pre- 
absorbed with cells expressing a stem mutant (AStem)* HA to remove 
non-stem-directed antibodies and analysed for binding to wild-type or 
AStem HA as previously described*. Stem-specific antibodies were 
detected in HA-nanoparticle-immunized ferrets (6/6) in greater fre- 
quency and magnitude than TIV-immunized ferrets (2/6) (Fig. 4b, left; 
P= 0.0056). Moreover, binding of these pre-absorbed sera to HA was 
reduced by CR6261 (Fig. 4b, right; P = 0.0019), further documenting the 
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Figure 4 | Improved neutralization breadth and detection of stem- and 
RBS-directed antibodies. a, Breadth of serum neutralization in immune 
ferrets. ICso titres against a panel of H1N1 pseudotyped viruses (left) and HAI 
titres against 1934 PR8 and 2007 Bris H1N1 viruses (right) were determined. 
Heat map is coloured as in Fig. 2. HAI titres are presented as box-and-whisker 
plots with lines at the mean (n = 6). b, Stem- and RBS-directed antibodies 
elicited by HA-nanoparticle immunization. Immune sera were pre-absorbed 
with AStem (left) and ARBS (middle) HA-expressing cells and analysed for 
their binding to wild-type and a respective mutant (A) HA. The mean endpoint 
titres were plotted with s.d. (n = 6). Binding of AStem HA pre-absorbed 
immune sera to HA pre-incubated with a control or CR6261 monoclonal 
antibody (right). Each symbol represents the titre of an individual ferret 

(n = 6). c, Neutralization competition with wild-type, AStem or ARBS HA 
(left). The neutralization of HA-nanoparticle-immune sera was measured in 
the presence of indicated competitor proteins. Percent neutralization at serum 
dilutions of 1/200 (1986 Sing and 2007 Bris), 1/800 (1995 Beijing) or 1/3,200 
(1999 NC) was plotted. Each symbol represents an individual ferret, and the 
mean is indicated as a red line with s.d. (n = 6 except for 2007 Bris (n = 3)). The 
relative contributions of stem- and RBS-directed neutralization were calculated 
and plotted as the mean percentage (n = 6). 


presence of antibodies targeting the same epitope as CR6261. The HAI 
titres against heterologous 2007 Bris virus were also significantly higher 
in ferrets immunized with HA-nanoparticles (6/6) than with TIV (3/6) 
(Fig. 4a, right; P = 0.0054). Interestingly, HA-nanoparticle-immune 
sera have HAI responses against a highly divergent 1934 PR8 strain, 
with titres = 40 in all ferrets. However, no HAI titres against 1934 PR8 
were detected in TIV-immunized ferrets (Fig. 4a, right). These data 
indicated that the HA-nanoparticles might elicit another class of neut- 
ralizing antibody directed towards the conserved RBS in the HA head. 
To dissect the specificity of the RBS-directed antibody response, we 
generated an RBS mutant HA (ARBS) by introducing a glycosylation site 
in the sialic acid binding pocket at residue 190 (Supplementary Fig. 7)”. 
Ferret immune sera were absorbed with ARBS HA-expressing cells to 
remove antibodies to HA outside of this region and tested for binding 
against wild-type or ARBS HA. RBS-directed antibodies were detected 
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with titres of >2,000 in all HA-nanoparticle-immunized ferrets, but only 
in 1 of 6 ferrets that received TIV (Fig. 4b, middle). 

To define the relative contributions of stem- and RBS-directed anti- 
bodies to the breadth of neutralization, we performed neutralization 
assays in the presence of competitor proteins: wild-type, AStem or 
ARBS HA. In the presence of excess AStem HA, only stem-directed 
antibodies can neutralize viruses; similarly, ARBS HA interferes with 
all antibodies except those targeting the RBS. Four H1INI strains were 
tested in this assay and the pattern of neutralization inhibition varied 
by strain. Neutralization of 1999 NC or 2007 Bris was mediated pre- 
dominantly by RBS-directed antibodies. However, neutralization of 
1986 Sing was due mainly to stem-directed antibodies. Interestingly, 
both stem- and RBS-directed antibodies contributed to neutralize 
1995 Beijing virus (Fig. 4c). Whereas the neutralization specificities 
of mouse and ferret antibodies differ somewhat from previously 
described human anti-stem antibodies, we have observed these differ- 
ences previously*. This variation is most likely due to differences in the 
origins of the IGHV genes that give rise to them and affect their fine 
specificity”*. In particular, anti-stem antibodies isolated from humans 
derive predominantly from the IGHV1-69 (V},1-69) gene’’, which is 
not present in other species. Even different human V;,;1-69-derived 
anti-stem antibodies show differences in breadth and fine specificity 
among influenza subtypes”. 

Based on the premise that highly ordered repetitive arrays induce yet 
stronger immune responses™, we have successfully designed an HA- 
nanoparticle to present trimeric HA spike in its native conformation, 
rigidly and symmetrically, with sufficient spacing to ensure optimal 
access to potential broadly neutralizing antibodies directed to the stem. 
These nanoparticles not only had the desired physical properties but also 
enhanced the potency and breadth of neutralizing antibody responses 
compared to TIV, the current commercial vaccine, and they were direc- 
ted to two independent highly conserved epitopes. Although not yet a 
universal influenza vaccine, these nanoparticles provide a major incre- 
ment in influenza protection by eliciting potent neutralizing antibodies 
against a broad spectrum of H1N1 viruses. Moreover, the synthetic 
nanoparticles are fully recombinant, eliminating the need to produce 
potentially dangerous virus in eggs or in cell culture, and allowing for 
modifications that improve immunogenicity which would otherwise not 
be tolerated in replication-competent viruses currently used to manu- 
facture vaccines. HA-nanoparticle technology therefore represents a 
foundation for a new generation of influenza vaccines and could be 
adapted to create analogous vaccines for a wide variety of pathogens. 


METHODS SUMMARY 


All genes used for recombinant proteins and pseudoviruses were synthesized using 
mammalian preferred codons. The HA-ferritin fusion gene was generated by 
fusing the ectodomain of HA (residues HA1 1-HA2 174, H3 numbering system) 
to H. pylori ferritin (residues 5-167) with a Ser-Gly-Gly linker. Recombinant 
proteins were produced by transient transfection of expression vectors in 293F 
cells (Invitrogen) and purified by chromatography techniques (see Methods for 
detail). The TIVs used in this study were 2006-2007 and 2011-2012 Fluzone 
(Sanofi Pasteur). Animal experiments were carried out in accordance with all 
federal regulations and NIH guidelines. Mice were immunized intramuscularly 
twice with 0.17 jig (Fig. 2a and b) or 1.67 yg (Fig. 2d) of HA-nanoparticles (HA 
amount) or matched amount of TIV with or without Ribi adjuvant system (Sigma) 
or with MF59 (Novartis) at a 3-week interval. Ferrets were immunized intramus- 
cularly with 2.5 ug of HA-nanoparticles or 7.5 jig of TIV with Ribi at weeks 0 and 4. 
HIN1 virus challenge was performed 5 weeks after the last immunization with 
10°° 50% egg infectious dose (EIDs9) of 2007 Bris virus via intranasal inoculation. 
Statistical analyses were performed using Prism 5 (GraphPad Software). 


Full Methods and any associated references are available in the online version of 
the paper. 


Received 28 August 2012; accepted 18 April 2013. 
Published online 22 May 2013. 


1. Salomon, R. & Webster, R. G. The influenza virus enigma. Cell 136, 402-410 
(2009). 


4 JULY 2013 | VOL 499 | NATURE | 105 


©2013 Macmillan Publishers Limited. All rights reserved 


LETTER 


11. 


12. 


18. 


Lambert, L. C. & Fauci, A. S. Influenza vaccines for the future. N. Engl. J. Med. 363, 
2036-2044 (2010). 

Yamashita, |. lwahori, K. & Kumagai, S. Ferritin in the field of nanodevices. Biochim. 
Biophys. Acta 1800, 846-857 (2010). 

Ekiert, D. C. et a/. Antibody recognition of a highly conserved influenza virus 
epitope. Science 324, 246-251 (2009). 

Sui, J. et al. Structural and functional bases for broad-spectrum neutralization of 
avian and human influenza A viruses. Nature Struct. Mol. Biol. 16, 265-273 (2009). 
Whittle, J. R. et a/. Broadly neutralizing human antibody that recognizes the 
receptor-binding pocket of influenza virus hemagglutinin. Proc. Nat! Acad. Sci. USA 
108, 14216-14221 (2011). 

Ekiert, D. C. et a/. Cross-neutralization of influenza A viruses mediated by a single 
antibody loop. Nature 489, 526-532 (2012). 

Wei, C. J. et al. Induction of broadly neutralizing H1N1 influenza antibodies by 
vaccination. Science 329, 1060-1064 (2010). 

Ledgerwood, J. E. et al. DNA priming and influenza vaccine immunogenicity: two 
phase 1 open label randomised clinical trials. Lancet Infect. Dis. 11, 916-924 
(2011). 


. Lee, L.A. & Wang, Q. Adaptations of nanoscale viruses and other protein cages for 


medical applications. Nanomedicine 2, 137-149 (2006). 

Li, C. Q., Soistman, E. & Carter, D. C. Ferritin nanoparticle technology.Anew platform 
for antigen presentation and vaccine development. /nd. Biotechnol. 2, 143-147 
(2006). 

Meldrum, F. C., Heywood, B. R. & Mann, S. Magnetoferritin: in vitro synthesis of a 
novel magnetic protein. Science 257, 522-523 (1992). 


. Jadaskeldinen, A. et al. Production of apoferritin-based bioinorganic hybrid 


nanoparticles by bacterial fermentation followed by self-assembly. Smail 3, 
1362-1367 (2007). 

Cho, K. J. et a/. The crystal structure of ferritin from Helicobacter pylori reveals 
unusual conformational changes for iron uptake. J. Mol. Biol. 390, 83-98 (2009). 
O'Hagan, D. T., Ott, G. S., Nest, G. V., Rappuoli, R. & Giudice, G. D. The history of 
F59 adjuvant: a phoenix that arose from the ashes. Expert Rev. Vaccines 12, 
13-30 (2013). 

bow, M. L., De Gregorio, E., Valiante, N. M. & Rappuoli, R. New adjuvants for 
human vaccines. Curr. Opin. Immunol. 22, 411-416 (2010). 

Nabel, G. J. & Fauci, A. S. Induction of unnatural immunity: prospects for a broadly 
protective universal influenza vaccine. Nature Med. 16, 1389-1391 (2010). 
Okuno, Y., lsegawa, Y., Sasao, F. & Ueda, S.A common neutralizing epitope 
conserved between the hemagglutinins of influenza A virus H1 and H2 strains. 
J. Virol. 67, 2552-2558 (1993). 


106 | NATURE | VOL 499 | 4 JULY 2013 
©2013 Macmillan Publishers Limited. All rights reserved 


19. 


20. 


21. 


Corti, D. et al. Heterosubtypic neutralizing antibodies are produced by individuals 
immunized with a seasonal influenza vaccine. J. Clin. Invest. 120, 1663-1673 
(2010). 

Corti, D. et al. A neutralizing antibody selected from plasma cells that binds to 
group 1 and group 2 influenza A hemagglutinins. Science 333, 850-856 (2011). 
Ekiert, D. C. et a/. A highly conserved neutralizing epitope on group 2 influenza A 


viruses. Science 333, 843-850 (2011). 

Krause, J. C. et al. A broadly neutralizing human monoclonal antibody that 
recognizes a conserved, novel epitope on the globular head of influenza H1N 
virus hemagglutinin. J. Virol. 85, 10905-10908 (2011). 

Lingwood, D. et al. Structural and genetic basis for development of broadly 
neutralizing influenza antibodies. Nature 489, 566-570 (2012). 


22. 


23. 


24. 
Rev. Immunol. 15, 235-270 (1997). 


Supplementary Information is available in the online version of the paper. 


1 


Bachmann, M. F. & Zinkernagel, R. M. Neutralizing antiviral B cell responses. Annu. 


Acknowledgements We thank H. Andersen, A. Taylor, A. Zajac and C. Chiedi for help 

with the animal studies; U. Baxa, K. Nagashima and A. Harned for electron microscopy 
studies; X. Chen for technical support; A. Panet, B. Graham, R. Schwartz and members 
of the Nabel lab for discussions; S. Sun and M. Rossmann for technical and conceptual 


advice; A. Tislerics, B. Hartman and J. Farrar for manuscript preparation. The MF5 


9 


adjuvant was kindly provided by Novartis. This work was supported by the Intramural 
Research Program of the Vaccine Research Center, NIAID, National Institutes of Health. 


Author Contributions M.K,, J.C.B. and GJ.N. developed the concept of HA-ferritin 
nanoparticles; M.K., C.-J.W. and G.J.N. designed the research studies; M.K., C.-J.W., 
H.MLY., P.M.M., J.C.B., J.R.R.W., W.-P.K., LW. and GJ.N. performed the research and 


analysed data; M.K., C.-J.W., H.IVLY., P.M.M., J.C.B., J.R.R.W. and GJ.N. discussed the 


results and implications; S.S.R. assisted in animal studies and sample collection; M.K., 


C.-J.W., J.C.B. and G.J.N. wrote the paper and al 
revisions. 


Author Information The authors declare that an intellectual property application 


information is available at www.nature.com/reprints. The authors declare competi 
financial interests: details accompany the full-text HTML version of the paper at 
www.nature.com/nature. Readers are welcome to commenton the online version o 
paper. Correspondence and requests for materials should be addressed to GJ.N. 
(Gary.Nabel@sanofi.com). 


authors participated in manuscript 


has 
been filed by NIH based on data presented in this paper. Reprints and permissions 


ng 
the 


METHODS 

Vector construction. The gene encoding Helicobacter pylori non-haem iron- 
containing ferritin (GenBank NP_223316) with a point mutation (N19Q) to abo- 
lish a potential N-linked glycosylation site was synthesized by PCR-based accurate 
synthesis” using human-preferred codons. The human CDS leader sequence and 
a serine-glycine-glycine (Ser-Gly-Gly) spacer were fused to the gene fragment 
encoding ferritin (residues 5-167) to generate a secreted protein. The plasmids 
encoding various influenza virus HAs, including A/South Carolina/1/1918 (1918 
SC), A/Puerto Rico/8/1934 (1934 PR8), A/Singapore/6/1986 (1986 Sing), A/Beijing/ 
262/1995 (1995 Beijing), A/New Caledonia/20/1999 (1999 NC), A/Solomon 
Islands/3/2006 (2006 SI), A/Brisbane/59/2007 (2007 Bris), A/California/04/2009 
(2009 CA), A/Perth/16/2009 (H3 2009 Perth), B/Florida/04/2006 (B 2006 
Florida), and their corresponding neuraminidases (NAs) with human preferred 
codons were synthesized as previously reported*®. HA-ferritin fusion genes were 
generated by fusing the ectodomain of HAs (residues HA1 1-HA2 174, H3 number- 
ing) from 1999 NC, 2009 CA, H3 2009 Perth and B 2006 Florida to H. pylori ferritin 
(residues 5-167) with a Ser-Gly-Gly linker. Transmembrane and soluble forms of 
1999 NC AStem* and ARBS* HA mutants were generated by introducing an 
N-linked glycosylation site at residues HA2 45 (I45N/G47T) and HA1 190 
(R192T), respectively. The soluble form of 2007 Bris ARBS HA mutant was gene- 
rated by introducing an N-linked glycosylation site at the same site. All genes were 
then cloned into mammalian expression vectors for efficient expression”®. Plasmids 
encoding the monoclonal antibodies CR6261 (ref. 4), CH65 (ref. 6) and a single- 
chain variable fragment (scFv) F10 (ref. 5) were synthesized as described previously’. 
Protein biosynthesis and purification. To produce ferritin nanoparticles, HA- 
nanoparticles and trimeric HA, the expression vectors were transfected into 293F 
cells (Invitrogen) using 293fectin (Invitrogen) according to the manufacturer’s 
instructions. Matched NAs were initially co-transfected at 20:1 HA:NA (w/w) to 
minimize the self-aggregation of HAs often observed with the soluble trimeric HA 
proteins, although further analysis showed that proper formation of the HA- 
nanoparticles did not require NA co-expression. The cells were grown in 
Freestyle 293 expression medium (Invitrogen) and the culture supernatants were 
collected 4 days post-transfection. The supernatants were concentrated and then 
buffer-exchanged to a Tris buffer (20 mM Tris, 50mM NaCl, pH 7.5 for ferritin 
nanoparticles; 20 mM Tris, 500mM NaCl, pH7.5 for HA-nanoparticles). The 
ferritin nanoparticles were purified by ion-exchange chromatography using a 
HiLoad 16/10 Q Sepharose HP column (GE Healthcare). The HA-nanoparticles 
were purified by affinity column chromatography using Erythrina cristagalli 
agglutinin (ECA, coral tree lectin; EY Laboratories, Inc.) specific for galactose 
B-(1,4)-N-acetylglucosamine. The ferritin nanoparticles and HA-nanoparticles 
were further purified by size exclusion chromatography with a Superose 6 PG 
XK 16/70 column (GE Healthcare) in PBS. The molecular weights of the ferritin 
nanoparticle and HA-nanoparticles were calculated based on two equations gen- 
erated by least squares linear regression on a semi-log plot using gel filtration low 
and high molecular weight standards (Bio-Rad), respectively. The yield of the HA- 
nanoparticles was typically 2-10 mg]~' depending on the HA strains. The tri- 
meric HA proteins were purified as described previously”’. Protein purity and size 
were verified by SDS-PAGE and dynamic light scattering using a DynaPro system 
(Wyatt Technology). Monoclonal antibodies and F10 scFv were produced in 293F 
cells and purified as described previously*”*. Monoclonal antibodies against 1999 
NC HA were purified from hybridoma supernatants as previously described*. 
Electron microscopic analysis. Purified ferritin nanoparticles and HA-nanopar- 
ticles were negatively stained with phosphotungstic acid and ammonium molyb- 
date, respectively, and images were recorded on a Tecnai T12 microscope (FEI) at 
80kV with a CCD camera (AMT Corp.). 

Haemagglutinin inhibition assay (HAI). Seed stocks of the influenza viruses 
were obtained from the Centers for Disease Control and Prevention (Atlanta, 
Georgia, USA) and the viruses were expanded in embryonated chicken eggs or 
in Madin-Darby canine kidney (MDCK) cells. Immune sera were pretreated with 
receptor-destroying enzyme (RDE II; Denka Seiken Co., Ltd) and HAI assays were 
performed using four haemagglutinating units per well and 0.5% turkey or chicken 
red blood cells. 

Pseudotyped virus neutralization and protein competition assays. The pseu- 
dotype neutralization assay was performed as previously described and has been 
widely accepted for defining the specificity of neutralizing antibodies targeting 
influenza virus HAs**°”’. For the protein competition assay, neutralizing activity 
of the F10, CR6261 or immune sera was measured in the presence of competitor 
proteins, trimeric HA (wild type, AStem or ARBS), HA-nanoparticles, ferritin 
nanoparticles or irrelevant protein (HIV-1 gp120) at final concentration of 20 
and 25 1gml | for monoclonal antibodies and immune sera, respectively. 
ELISA. Purified trimeric HA, HA-nanoparticles, and TIV (2 ug of H1 HA ml — yi 
ferritin nanoparticles (0.68 jg ml”! for Supplementary Fig. 2 or 2 pg ml! for the 
rest), mouse liver ferritin (2 ug ml ~ t Alpha Diagnostic International, Inc.), AStem 
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and ARBS HA trimer (2 pg ml ') were coated (100 ul per well) onto MaxiSorp 
plates (Nunc). For the ELISA-based competition assay, HA trimer (2 1g ml 1) was 
coated onto the plates and the plates were incubated with CR6261 or an isotype 
control (VRCO01)*°*! at 8pgml * before adding serially diluted pre-absorbed 
ferret immune sera. 

Intracellular cytokine staining assay. CD4*~ and CD8* T cell responses were 
evaluated by intracellular cytokine staining for interferon  (IFN-y), tumour nec- 
rosis factor « (TNF-«) and interleukin 2 (IL-2) as described previously’. We used 
individual peptide pools (15-mer overlapping by 11 residues, 2.5 jg ml’ for each 
peptide) covering H. pylori ferritin or murine ferritin light and heavy chains to 
stimulate cells. 

Immunization. The TIV used in this study was 2006-2007 Fluzone (Sanofi Pasteur) 
containing HAs from A/New Caledonia/20/1999 (H1N1), A/Wisconsin/67/2005 
(H3N2) and B/Malaysia/2504/2004 (influenza B), or 2011-2012 Fluzone with 
HAs from A/California/07/2009-like (H1N1), A/Perth/16/2009 (H3N2) and 
B/Brisbane/60/2008 (influenza B) (Supplementary Fig. 6). The TIV split vaccines 
are treated with a detergent (octoxinol-9) to solubilize membranes on influenza 
viruses and form rosettes that contain full-length HAs and NAs. Female BALB/c 
mice (6-8 weeks old; Charles River Laboratories) were immunized (five mice per 
group) with 0.5 pg (0.17 jug of H1 HA) or 0.22 jig (0.17 jig of HA) of TIV or HA- 
nanoparticles, respectively (Fig. 2a, b), or 5 pig (1.67 ug of H1 HA), 2.24 pg (1.67 pg of 
HA) or 0.57 1g (equimolar to HA-nanoparticles) of TIV, HA-nanoparticles or 
ferritin nanoparticles, respectively (Fig. 2d). All immunizations were given intra- 
muscularly in 100 pl of PBS or in 100 pil of 50% (v/v) mixture of Ribi adjuvant system 
(Sigma) in PBS at weeks 0 and 3. Ina separate experiment, MF59 (Novartis) was used 
as the adjuvant in place of Ribi. A group of BALB/c mice (n = 4) was immunized 
with 20 jg of trimeric HA with Ribi adjuvant at weeks 0 and 4. For the experiment in 
Supplementary Fig. 6b, mice were immunized (n = 5) with 5 pg (1.67 ug of each HA 
component) of TIV, 2.24 tig (1.67 tig of HA) of monospecific HA-nanoparticles or 
6.72 tg (1.67 tig of each HA component) of multispecific HA-nanoparticles with 
Ribi adjuvant at weeks 0 and 3. Blood samples were collected before the first dose, and 
at 2 weeks after each immunization. For ferret studies, male Fitch ferrets (6 months old; 
Triple F Farms), seronegative for HIN1, H3N2 and influenza B viruses, were housed 
and cared for at BIOQUAL, Inc. (Rockville, MD). Ferrets were immunized (six ferrets 
per group) intramuscularly with 500 pl of PBS, 7.5 ug (2.5 ug of H1 HA) of TIV or 
3.35 ug (2.5 ug of HA) of HA-nanoparticles in 500 pl of 50% (v/v) mixture of Ribi 
adjuvant in PBS at weeks 0 and 4. Blood was collected before the first dose and 3 and 
2 weeks after the first and the second immunizations, respectively. Animal experi- 
ments were carried out in accordance with all federal regulations and NIH guidelines. 
Virus challenge. Five weeks after the last immunization, the ferrets were chal- 
lenged with 10°° 50% egg infectious dose (EID59) of 2007 Bris virus. The virus was 
expanded in embryonated chicken eggs from a seed stock obtained from Centers 
for Disease Control and Prevention (Atlanta, Georgia, USA) and has a titre of 10°° 
EIDso ml *. The ferrets were observed for clinical signs twice daily and weight and 
temperature measurements recorded daily. Nasal washes were obtained on days 1, 
3 and 5 and infectious viral titres were determined by a 50% tissue culture infec- 
tious dose (TCIDs9) assay using MDCK cells as described previously’. 

Serum absorption. Ferret immune sera taken 2 weeks after the second immun- 
ization were subjected to the assay. One ml of the immune sera diluted at 1:100 and 
1:1,000 was incubated with 100 ul of pre-washed AStem and ARBS HA-expressing 
293F cell pellets, respectively. After incubating for 1h at 4°C, supernatants were 
collected by centrifugation and binding to wild-type and mutant HAs was examined 
by ELISA. The AStem HA-pre-absorbed sera were also used for competition ELISA. 
Statistical analysis. All data plotted with error bars are expressed as means with 
s.d. unless otherwise indicated. The P values were generated by analysing data with 
a two-tail unpaired ft test using the Prism 5 program (GraphPad Software). In 
Fig. 4b, right panel, the data were analysed by two-way ANOVA using Prism 5. 
Molecular representations. All structural renderings of proteins were generated 
using the UCSF Chimera package”’, version 1.7.0 (http://www.cgl.ucsf.edu/chimera/) 
or The PyMOL Molecular Graphics System, version 1.5.0.4 (Schrédinger, LLC; 
http://www.pymol.org/). UCSF Chimera is developed by the Resource for Bio- 
computing, Visualization, and Informatics at the University of California, San 
Francisco, California, USA (supported by NIGMS P41-GM103311). 
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Structural basis for alternating access of a eukaryotic 
calcitum/proton exchanger 


Andrew B. Waight', Bjorn Panyella Pedersen', Avner Schlessinger*, Massimiliano Bonomi’, Bryant H. Chau', Zygy Roe-Zurz', 


Aaron J. Risenmay’, Andrej Sali* & Robert M. Stroud! 


Eukaryotic Ca”* regulation involves sequestration into intracellu- 
lar organelles, and expeditious Ca”* release into the cytosol is a 
hallmark of key signalling transduction pathways. Bulk removal of 
Ca’* after such signalling events is accomplished by members of 
the Ca**:cation (CaCA) superfamily'®. The CaCA superfamily 
includes the Na*/Ca?* (NCX) and Ca**/H* (CAX) antiporters, 
and in mammals the NCX and related proteins constitute families 
SLC8 and SLC24, and are responsible for the re-establishment 
of Ca’* resting potential in muscle cells, neuronal signalling 
and Ca’* reabsorption in the kidney®. The CAX family members 
maintain cytosolic Ca** homeostasis in plants and fungi during 
steep rises in intracellular Ca”* due to environmental changes, or 
following signal transduction caused by events such as hyperosmotic 
shock, hormone response and response to mating pheromones”. 
The cytosol-facing conformations within the CaCA superfamily are 
unknown, and the transport mechanism remains speculative. Here 
we determine a crystal structure of the Saccharomyces cerevisiae 
vacuolar Ca”*/H* exchanger (Vcx1) at 2.3 A resolution in a cytosol- 
facing, substrate-bound conformation. Vcx1 is the first structure, 
to our knowledge, within the CAX family, and it describes the key 
cytosol-facing conformation of the CaCA superfamily, providing the 
structural basis for a novel alternating access mechanism by which the 
CaCA superfamily performs high-throughput Ca** transport across 
membranes. 

The CaCA superfamily is defined by the presence of two short, 
repeating homologous sequences, termed the o-repeats, found in pre- 
dicted transmembrane regions. The o-repeats are opposite in topology 
and are believed to have arisen from a gene duplication event’*"*. 
Mutagenesis and recent structural data have identified this region as 
essential for ion binding and transport, and specifically two key acidic 
residues (Glu or Asp) are implicated in coordinating Ca”* ions at the 
active site’’’°. Members of the CAX family are approximately 400 
residues long with 11 predicted transmembrane helices. The first helix 
(MR), found in eukaryotic CAX members, has a regulatory role in 
plant members and is suggested to be involved in protein targeting 
and/or signalling in yeast'*”’*. The 10 remaining transmembrane 
helices (M1-M10) perform the transport function, and are composed 
of two symmetrically related halves (M1-M5 and M6-M10) connected 
through a negatively charged loop termed the ‘acidic motif’!?'°’. 
Saccharomyces cerevisiae Vcx] catalyses low-affinity (Michaelis constant 
(Kin) = ~25 UM), high-capacity (maximum rate (Vinax) = ~35 nmol 
Ca** min ' mg’ ') vacuolar Ca” exchange**”*5, To establish function 
of the purified protein, Vcxl was reconstituted into liposomes and 
assayed for Ca** uptake activity. In this system, purified Vcx1 demon- 
strates Ca’* uptake monotonically dependent on pH gradient 
(Supplementary Fig. 1). Vcx1 shares ~30% sequence identity with other 
members of the Ca**/H* exchanger family, including the canonical 
CAX proteins of Arabidopsis thaliana (Supplementary Fig. 2). 

Vcx1 was solved experimentally to 2.3 A resolution (Ree of 22.5%) 
by molecular replacement, supported by iodine-based experimental 


phases (Fig. 1, Supplementary Table 1 and Supplementary Fig. 3). 
The structure encompasses residues 22-401 with the exception of a 
short loop (184-191) between M4 and M5. Two identical (root mean 
squared deviation (r.m.s.d.) 0.21 A over 285 Cou atoms) monomers are 
found in the asymmetric unit. Six divalent cations are identified as Ca" 
or Mn?" in the Vcxl monomer, on the basis of their coordination 
geometry and anomalous scattering differences (Supplementary Fig. 4). 

The shape of the VCX monomer, viewed perpendicular to the mem- 
brane plane, resembles that of a wedge (Fig. 1). Viewed from the 
vacuolar side of the membrane, the tapered end of the wedge consists 
of two long antiparallel helices M1 and M6, which are intertwined and 
tilted ~30° with respect to the membrane normal. The central four- 
helix core contains the o-repeats, and is comprised of M2-M3 and 
M7-M8. M2 and M7 are kinked at their midpoints and change dir- 
ection ~35° in the middle of the membrane plane to create M2a/M2b 
and M7a/M7b. These two oppositely related helix kinks meet in the 
mid-membrane plane, forming an hourglass shape, where the CAX 
family display the conserved GNXXE(H) signature sequence necessary 
for calcium binding and transport'*”°’*’’, M3 and M8 are also tilted 
with respect to the membrane normal and line the interior of the 
hourglass. M4d-M5 and M9-M10 form the outer components of a 
right-handed bundle which flank the central core and constitute the 
thicker side of the wedge shape. The 20-residue ‘acidic motif’ connect- 
ing the two duplicated halves of the protein between M5 and M6 
is predicted to be disordered based on sequence. However, a clearly 
resolved o-helix (which we term the acidic helix) for this sequence is 
observed in the structure. This helix is oriented parallel to the membrane 
and lies directly underneath the «-repeat regions on the cytosolic side. 

A centrally located Ca?* ion occupies the active site of Vcx1, coor- 
dinated by Glu 302 on M7b and Ser 325 on MB (Fig. 2). The Ser 325 
residue is generally conserved throughout the CaCA superfamily, and 
in NCX and NCKX family members the analogous serine residue has 
been shown to have an important role in Ca”* transport (Supplemen- 
tary Fig. 5)'*°. Three ordered water molecules complete the octahedral 
coordination geometry of Ca”* (Supplementary Fig. 4b). The presence 
of water molecules at the binding site suggest that the Ca** ion reaches 
the active site in a partially hydrated state, balancing the stronger 
binding of entropically ordered side chains with more loosely bound 
water to complete the coordination sphere. Glu 106, Asn 299 and the 
backbone carbonyl of Gly 102 coordinate the three water molecules. 
The remainder of the Ca’ * active site is stabilized by specific interac- 
tions from polar residues in the transmembrane regions of M2, M3, 
M7 and M8. The conserved Asn 299 and His 303 of M7b form a 
hydrogen bond to Ser 129 and Ser 132, respectively, of the adjacent 
M3 helix, and the conserved Asn 103 on M2b forms a hydrogen bond 
to Gln 328 on M8. M2a is bent away from the bundle of helices M2b, 
M3, M7 and M8, and in this configuration it is not packed tightly 
against the protein body (Fig. 3). M2a and the connected C-terminal 
half of M1 are bent away from the active site, exposing the central Ca”" 
ion to the cytosol. The M2a/M1 arrangement creates a substantial 
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Cytosol 


Figure 1 | Topology and fold of the Vcx1 protein. The symmetrically related 
halves of the Vcxl monomer are coloured in a double colour spectrum from 
the N to C terminus. Helices of matching colour are related by symmetry. 
a-c, The Vcx1 monomer as viewed in the membrane along the axis of 


vestibule that is accessible from the intracellular bulk solvent. This 
vestibule is conical in shape and has a negatively charged interior 
surface potential (Fig. 3c). The interior of the cavity vestibule is cir- 
cumscribed by M2a, the C-terminal half of M1, M7b and the 
N-terminal half of M8, and allows access from the cytosol to the cen- 
tral Ca** binding site. Thus, the Vcx1 protein structure represents a 
substrate-bound, cytosol-facing conformation. 

Lying across the cytosolic entrance to the vestibule, the acidic helix 
also coordinates two Ca*" ions (Supplementary Fig. 4c). These two 
ions lie on the cytosolic side, 11 A from the central Ca?* site, coordi- 
nated by Asp 234 and Glu 230 of the acidic helix and by Glu 83 of M1 
(Fig. 2c). The acidic motif has been suggested to have a role in Ca** 


Site 1 


Acidic helix 


symmetry (a), rotated by 90° (b) and viewed from the vacuolar side of the 
membrane (c). d, Topology map of the Vcx1 monomer; CAX family conserved 
residues are coloured in red, o%-repeat sequences are denoted by dashed circles. 


binding”. In mammalian NCX members, the analogous region con- 
necting helices M5 and M6 contains a large intracellular calcium- 
binding domain (CBD1) responsible for stimulating activity in the 
transporter domain in the presence of Ca’* (ref. 28) (Supplemen- 
tary Fig. 5). The CBD1 Ca’" binding sites are similarly formed from 
acidic motifs although they coordinate ions using B-sheets rather than 
a-helical secondary structures”. Molecular dynamics simulations per- 
formed with the Vcx1 structure suggest that the acidic helix maintains 
an a-helical conformation in the presence of the two coordinated Ca** 
ions, and becomes more flexible in their absence (Supplementary 
Fig. 6). The increased rigidity of the Vcxl1 acidic helix at higher 
Ca’** concentrations indicates a possible Ca”*-dependent regulatory 


Figure 2 | Calcium binding sites in the Vcx1 crystal structure. a, Overview of 
site 1 and site 2 with helices MR, M1 and M6 removed for clarity. The cytosol is 
on the bottom of the image and Ca** ions are coloured in yellow. b, Active site 
Ca’* substrate ion and interacting residues found in site 1. Hydrogen bonds are 
shown as dashed lines; numbers denote atomic distances (A). 2mF, — DF, map 
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is shown contoured at 1¢ (blue mesh). ¢, Ca” ions at the acidic helix in site 2 
with interacting residues labelled. Hydrogen bonds are shown as dashed lines; 
numbers denote atomic distances (A). 2mF, — DF, map is shown contoured at 
lo (blue mesh). 
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Figure 3 | The cytoplasmic vestibule. a, The protein cavity is rendered as a 
surface representation and is coloured grey; the helices M1 and M2 are coloured 
purple and are shown from the axis of symmetry. b, View rotated by 90°. c, The 
cytoplasmic vestibule as oriented in panel a and depicted with a slab surface 


function for this region, perhaps augmenting conductance in the pres- 
ence of increased cytosolic Ca**. 

Comparison of the two structural repeats (M1-M5 and M6-M10) 
of VcxI reveals a structurally similar core region that is closely packed 
and rigid (M3-M5, M8-M10) (Supplementary Fig. 7b). In contrast, 
considerable differences are found in the M2a helix and C-terminal 
half of M1 when compared to M7a and M6. Superposition between 
helices M1-M2a and M6-M7a reveal a ~12° and ~7° asymmetric 
difference in the angle of M1 and M2a, respectively (Supplementary 
Fig. 7c, d). This structural divergence, in combination with loose 
packing and intracellular location, implicate this mobile region as the 
cytosolic gate. A dynamic straightening of the M1/M2a helices would 
collapse the cytosolic vestibule, and this motion could be coordinated 
by a structural rearrangement into a vacuole-facing conformation. 

The Vcx1 conformation also sheds light on the transport cycle of 
CaCA proteins by comparison with the recent structure of an arch- 
aebacterial Na*/Ca** exchanger from Methanococcus janaschii 
(mjNCX)”’. Despite low sequence identity (14%) to Vcx1, the overall 
fold and topology of mjNCX is similar. However, unlike Vcx1, the 
mjNCX exchanger is closed to the cytosolic environment and instead 
represents a periplasm-facing conformation, as reflected in the overall 
displacement between similar atoms (r.m.s.d. 5.7A over 269 Co. 
atoms). Structural alignment of the Vcxl and mjNCX structures 
reveals a similar placement of the core region and of helices M7 and 
M2b (Supplementary Fig. 8). However, the M2a helix is shifted by 
~16 A towards the centre of the bilayer in Vex] (Fig. 4a). In addition, 
relative to the mjNCX structure, the position of both loosely packed 


Figure 4 | Transport cycle of Vcx1 and structural comparison to mjNCX. 
a, Comparison of M2 and M7 and active site glutamate residues between Vcx1 
(purple) and mjNCX (cyan). Ca’* ions from each model are depicted as 
spheres. b, Comparison of M1 and M6 between Vcx1 (purple) and mjNCX 
(cyan). c, Schematic of Vcx1 turnover. Structures are coloured as in panel 
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representation coloured by electrostatic potential (red to blue; —10 to 
10kTe '). Helices MR, M1 and M6 have been removed for clarity. Ca?* ions 
(yellow spheres) pinpoint site 1 and site 2. 


helices M1 and M6 are translated diagonally towards the vacuole by 
~16A and ~13A at either end, closing a vacuole-facing portal that 
could otherwise expose the active site of the protein to the vacuolar 
environment (Fig. 4b and Supplementary Figs 8 and 9). The concerted 
transposition in the M1/M6 helices therefore performs a dual role of 
coordinating motions between the -repeats and covering/uncovering 
an active site entry passage (Fig. 4c). By this mechanism, translational 
movements of the M1/M6 helices allow alternating access to the active 
site of Vcx1 from both sides of the membrane. The action of the M1/ 
M6 helices is therefore analogous to the piston of a two-stroke engine 
that occludes and exposes intake and efflux pathways during each 
turnover. Using a predicted cytosol-facing mjNCX conformation, a 
similar motion of the M1 and M6 helices was suggested for turnover by 
the mjNCX monomer”’. Our data augment the proposed mechanism 
by including structural evidence for the M1/M6 translations and sub- 
stantial conformational changes in the M2a helix. With the addition of 
a cytoplasmic-facing Vcx]1 structure, there are now two key states in the 
CaCA family that suggest a trajectory for Ca** translocation, forming a 
strong case for the two-stroke mechanism of alternating access. 

The proposed transport cycle of Vcxl is shown in Fig. 4c (Sup- 
plementary Video 1). In the active site, Glu106 and Glu302 are 
exposed to the vacuolar side (pH ~5-6 (ref. 30)). The proton motive 
gradient across the vacuolar membrane provides the source of energy 
to drive a conformational change to the cytosol-facing conformation 
whereupon the glutamate residues would be expected to maintain a 
negative charge (at pH ~7 (ref. 30)). Under conditions of high cyto- 
solic Ca** concentration, as seen during signal transduction events, 


Outward facing 


¢ Vacuole [H*] e) 


Inward facing 


a. Proposed substrate movement is denoted by black arrows, and calcium by 
yellow circles. Red arrows show protein movement in the cytosol-facing state of 
Vcx] (left) that results in the vacuole-facing conformation on the right. Return 
to the cytosol-facing state presumably requires reversal of the movements 
denoted by the red arrows. 
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Ca2* 


is coordinated by the acidic helix, and Ca?" is able to reach the 


active site. The Vcxl side chains of Glu302 and Ser 325 partially 
replace the Ca”* hydration shell, and subsequent completion of coor- 
dination by Glu 106 displaces some of the remaining water molecules 
to bring helix M2b inward towards the active site. This movement of 
M2 towards the core can initiate M2a straightening and M1/M6 trans- 
lation, closing the cytosolic vestibule. The translation of helices M1/M6 
uncovers a vacuolar cleft and coordinates opening of M7a to expose 
the active site Ca”* ion to the vacuole. The vacuole-facing conforma- 
tion, in combination with the acidic pH in the vacuole, lowers the Ca** 
affinity of active site residues Glu 106 and Glu 302, leading to release of 
the Ca” substrate into the vacuole. The cyclical pumping action of the 
M1/M6 ‘piston’, coupled to flexible helices surrounding the active site 
(M2a, M7a), provides an efficient framework for the rapid turnover 
necessary for high-throughput Ca** exchange. 


In conclusion, Vcx1 is the first CAX family structure, and the first 


structure of the CaCA superfamily in a cytosol-facing conformation. It 
provides a structural basis for an alternating access mechanism for the 
Vcx1 protein and the CaCA superfamily in general. These findings lay 
the groundwork for future exploration of Ca”* transport by CaCA 
superfamily members and lend insight into fundamental aspects of 


Ca? 


+ fi 3 A ‘ 
homeostasis and eukaryotic signal transduction processes. 


METHODS SUMMARY 


The Vcxl protein from Saccharomyces cerevisiae (Uniprot ID Q99385) was 
expressed in S. cerevisiae and purified using a decahistidine affinity-tag. Solubili- 
zation and purification were performed using dodecy]-f-p-maltoside. Crystals were 
grown in-meso by combining lipidic cubic phase technique and Jeffamine M-600 
sponge phase conditions with vapour phase diffusion. X-ray diffraction was col- 
lected at the Advanced Light Source beamline 8.3.1, Advanced Photon Source 
beamline 23-ID-B and Stanford Synchrotron Radiation Lightsource beamline 
12-2. The structures were solved by single-wavelength anomalous diffraction and 
molecular replacement methods (MR-SAD) using mjNCX (Protein Data Bank 
accession 3V5U (ref. 17)) as a search model. The final structural model was refined 
using data to 2.3 A toa crystallographic R-factor of 20.1% and free R-factor of 22.5% 
(Supplementary Table 1). 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Expression and purification. The Vcx1 protein from Saccharomyces cerevisiae 
(Uniprot ID Q99385) was incorporated into the 2 expression plasmid p423- 
GALI modified with N-terminal and C-terminal purification tags, as described*". 
Transformed S. cerevisiae (strain DSY-5; MAT his3::GAL1-GAL4 pep4 prb1- 
1122) were grown in a fermenter culture vessel (Biostat C15L Sartorius AG) to 
high density and induction was performed via fed-batch using 40% galactose and 
harvested after 18-22 h. Harvested yeast (~1.8-2 kg wet cell weight) were washed 
in cold water, pelleted (6,000 r.p.m.) and flash frozen for storage at —80 °C. Frozen 
pellets were thawed in lysis buffer (100 mM Tris 7.0, 700 mM NaCl, 1 mM phe- 
nylmethylsulphonyl fluoride (PMSF) + protease inhibitors) before cell disruption 
using a bead mill. The homogenate was centrifuged for 25 min at 21,600g, followed 
by sedimentation of membranes via ultracentrifugation at 185,000g for 150 min. 
Membrane pellets were re-suspended in membrane re-suspension buffer (50 mM 
Tris pH 7.0, 600 mM NaCl, 20% glycerol) before being frozen in liquid nitrogen in 
7-g aliquots. One-hundred grams of yeast cell material yielded an average of 
20-25 g membrane. Membrane aliquots (8g) were thawed and suspended in 
112ml membrane solubilization buffer (50mM Tris pH7.0, 600mM NaCl, 
20mM CaCl, 10% glycerol 1mM PMSF + protease inhibitors) and solubilized 
using 1,380 mg n-dodecyl-f-p-maltoside (DDM) (1:0.19 (w/w) ratio) for 30 min 
at 4°C, followed by centrifugation at 120,000g for 30 min to remove unsolubilized 
material. The resultant lysate was supplemented with 8 mM imidazole pH 6.5 and 
incubated for ~3 h with 6 ml pre-equilibrated TALON Co’* resin. After incuba- 
tion, the beads were collected by gravity flow-through using a Bio-Rad econo- 
column and washed twice via neutation with 30 ml buffer A (50 mM Tris pH 7.0, 
0.1% DDM, 20mM CaCl, 5mM MnCl, 10% glycerol) supplemented with 
15mM and 30 mM imidazole pH 6.5, respectively. The protein was eluted from 
the beads using three elutions of 5 ml buffer A supplemented by 500 mM imidazole 
pH6.5. The elutions were pooled, bovine thrombin and 3C protease were added to 
cleave the tags, and dialysed with a 25-kDa cutoff into 1] of dialysis buffer (50 mM 
MES pH6.0, 20mM CaCl,, 5mM MnCl, 10% glycerol) overnight at 4°C. The 
following day the eluate was concentrated to 500 ul and injected onto a size- 
exclusion column (Superdex 200, GE Healthcare) equilibrated SEC buffer 
(10mM MES pH 6.0, 0.05% DDM, 20mM CaCl, 5mM MnCl,). Peak fractions 
were collected and concentrated to ~30 mg ml. 

Reconstitution and transport assay. Ca** uptake into proteoliposomes using 
purified Vcxl protein was performed primarily using the method described 
previously*’. In brief, 10 mg of yeast polar lipid extract (Avanti Polar Lipids) 
was dried under nitrogen and re-suspended into 10 mM MOPS pH6.5, 100 mM 
choline chloride, and 100 uM Fura-2** (Sigma-Aldrich). The re-suspension was 
sonicated to transparency, subjected to 10 cycles of freeze-thaw, and then 
extruded through a 400-nm filter 10 times. The resulting liposomes were desta- 
bilized by addition of 1% octyl §-b-glycopyranoside (OG) and purified Vcx1 was 
added in a 1:500 (w/w) ratio and incubated for 1 h. OG was removed by addition of 
200 mg ml! Bio-Beads (Bio-Rad) for 3h and replaced with 200mg ml ' fresh 
Bio-Beads for incubation overnight. Proteoliposomes were harvested by centrifu- 
gation at 66,000g for 150 min and re-suspended in 10 mM MOPS pH.6.5, 100 mM 
choline chloride. Proton gradient was initiated by the addition of 20 tl proteolipo- 
some to 20,1 reaction buffer containing 100mM choline chloride and 10mM 
MOPS pH 7.9, 7.2 or 6.5 (final pH 7.1, 6.8. 6.5) in a Corning 384 well clear bottom 
microplate. Transport activity followed the addition of 100 4M CaCl, and uptake 
of Ca** was monitored at 22 °C via the changes in emission of Fura-2 at 510nm 
upon excitation at 340 and 380 nm at 10-s intervals using a Molecular Devices 
SpectraMax microplate reader. Maximal signal was obtained via addition of 0.3% 
DDM, the ratio of emission intensities at the two excitation wavelengths was 
converted to Ca”* using a standard curve and previously described methods. 
Crystallization. Forty microlitres of concentrated Vcx1 was mixed with 60 pl 
monoolein to prepare the lipidic cubic phase (LCP) as previously described”. 
Crystals were grown by adding 1 pil Vcx1/LCP mixture to a glass coverslip and 
overlayed with 2 1l crystallization solution (14% Jeffamine M600 pH 7.0, 100 mM 
HEPES pH7.0, 50mM CaCl,, 50mM MnCl, 200mM Nal). Coverslips were 
sealed in a hanging-drop setup in 24-well trays containing 300 ul crystallization 
solution. Crystals appeared in a subsequent sponge phase in approximately 
3-4 days and grew to a maximum size of 250 tm. Crystals were harvested from 
the trays and frozen directly in liquid nitrogen for data collection. Data were 
collected at Advanced Light Source beamline 8.3.1, Advanced Photon Source beam- 
line 23-ID-B and Stanford Synchrotron Radiation Lightsource beamline 12-2. 
Holmium heavy-atom derivatives were obtained by adding Ho(i1)Cl, to the crystals 
1h before flash-cooling, either as salt or as a concentrated, aqueous solution. 
Data processing. Data sets were processed using XDS* in space group R3. An 
initial marginal molecular replacement solution was provided by the mjNCX struc- 
ture (Protein Data Bank accession 3V5U; 14% identity) using the PHENIX**® 
AutoMR program and improved upon by PHENIX°*® AutoBuild. Initial iodine 
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and holmium heavy atom sites were located in anomalous difference maps calcu- 
lated in the CCP4*’ package using molecular replacement phases and a holmium 
derivative data set. The heavy atom sites were refined using the program 
AutoSHARP**, and subsequent density modification was performed using 
RESOLVE”. Refinement of the structure was performed by PHENIX°® Refine 
and the model was built using COOT”. The assignment of ions in the model was 
aided by appropriate coordination by liganding side chains and anomalous 
scattering at Cu Ka wavelength (8keV f (P )=69e , f "(Mn?*) = 2.83e , 
f'(Ca?*) = 1.3e" ). The final structural model was refined using data to 2.3 Awith 
a crystallographic R-factor of 20.1% and a free R-factor of 22.5% (Supplementary 
Table 1). 

Comparative modelling and structural analysis. A comparative model of Vcx1 
in the vacuole-facing conformation was constructed using MODELLER-9V11"', 
based on the mjNCX X-ray structure (Protein Data Bank accession 3V5U (ref. 
17)). The alignment between the sequences of Vcx1 and mjNCX was obtained by 
manually editing the alignments from UCSF Chimera” and PROMALS3D”. The 
Vcx1 structure and model was analysed and visualized using UCSF Chimera” and 
PyMol", and electrostatic surfaces were calculated using APBS*. 

Molecular dynamics simulations. Molecular dynamics simulations were per- 
formed with GROMACS4", using the CHARMM27” all-atom force field and 
the TIP3P** water model. Vcx1 was oriented in an implicit lipid bilayer using 
PPM”, then immersed in an explicit 1,2-dimyristoyl-sn-glycero-3-phosphocholine 
(DMPC) lipid bilayer and water using CHARMM-GUI”. Periodic boundary con- 
ditions and a triclinic box with the volume of 604.326 nm? were used. Two inde- 
pendent simulations were carried out, one with and another one without the two 
Ca’* coordinating Glu 230 and Asp 234 residues. Equilibration was performed by 
three 10-ns-long runs, gradually increasing the temperature from 100K to 300K, in 
the canonical (NVT) ensemble controlled by the Berendsen*' thermostat. The 
positions of non-hydrogen atoms of Vcx1 were restrained by a harmonic potential, 
with gradually decreasing force constant. A final equilibration step was carried out 
for 20 ns without restraints, in the isothermal-isobaric (NpT) ensemble controlled 
by the semi-isotropic Berendsen”' barostat. Each production run was 100 ns long, in 
the NpT ensemble controlled by the Bussi-Donadio-Parrinello” thermostat and 
the semi-isotropic Parrinello-Rahman” barostat. 
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Structural basis of histone H2A-H2B recognition by 
the essential chaperone FACT 


Maria Hondele!*, Tobias Stuwe?+*, Markus Hassler’, Felix Halbach*, Andrew Bowman}, Elisa T. Zhang"t, Bianca Nijmeijer’, 
Christiane Kotthoff', Vladimir Rybin*, Stefan Amlacher*, Ed Hurt* & Andreas G. Ladurner'*"° 


Facilitates chromatin transcription (FACT) is a conserved histone 
chaperone that reorganizes nucleosomes and ensures chromatin 
integrity during DNA transcription, replication and repair’. 
Key to the broad functions of FACT is its recognition of histones 
H2A-H2B (ref. 2). However, the structural basis for how histones 
H2A-H2B are recognized and how this integrates with the other 
functions of FACT, including the recognition of histones H3-H4 
and other nuclear factors, is unknown. Here we reveal the crystal 
structure of the evolutionarily conserved FACT chaperone domain 
Sptl16M from Chaetomium thermophilum, in complex with the 
H2A-H2B heterodimer. A novel ‘U-turn’ motif scaffolded onto a 
Rtt106-like module”"'° embraces the a1 helix of H2B. Biochemical 
and in vivo assays validate the structure and dissect the contri- 
bution of histone tails and H3-H4 towards Spt16M binding. 
Furthermore, we report the structure of the FACT heterodimeri- 
zation domain that connects FACT to replicative polymerases. Our 
results show that Sptl6M makes several interactions with histones, 
which we suggest allow the module to invade the nucleosome gra- 
dually and block the strongest interaction of H2B with DNA. FACT 
would thus enhance ‘nucleosome breathing’ by re-organizing the 
first 30 base pairs of nucleosomal histone-DNA contacts. Our 
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Figure 1 | The histone chaperone complex FACT recognizes the histone 
H2A-H2B heterodimer through the Sptl6M domain of Spt16. a, Domain 
organization of yeast Sptl6. Mutants isolated in S. cerevisiae (black lines) anda 
loss-of-function deletion in human Spt16 are indicated’>. b, c, V5- 
immunoprecipitations of yeast (b) and human (c) Sptl6M with H2A-H2B. 


snapshot of the engagement of the chaperone with H2A-H2B 
and the structures of all globular FACT domains enable the high- 
resolution analysis of the vital chaperoning functions of FACT, 
shedding light on how the complex promotes the activity of enzymes 
that require nucleosome reorganization. 

The essential heterodimeric chaperone FACT destabilizes nucleo- 
somes to promote polymerase progression on chromatin templates’ *"! 
and maintains chromatin structure in vivo*®. The recognition of the 
histone H2A-H2B heterodimer is crucial for the molecular functions of 
FACT’. To map the region(s) specifically responsible for H2A-H2B 
binding, we tested all globular domains within FACT using pull-down 
assays. Biochemical dissection of yeast FACT’ (composed of the 
Spt16-Pob3 heterodimer) had identified four globular domains (the 
Sptl6 amino-terminal domain (Sptl16N)'*", the heterodimerization 
domain Sptl6D-Pob3N, the middle domain of Sptl6 (Sptl6M) and 
the middle domain of Pob3 (Pob3M)") and carboxy-terminal acidic 
stretches (Sptl6C and Pob3C) (Fig. la). We find that only Sptl6M, 
where most of the genetically identified, functionally deficient muta- 
tions cluster (Fig. 1a), recognizes H2A—H2B similarly to full-length 
Spt16 (Fig. 1b). Human Spt16M (encoded by the SUPT16H gene) also 
binds H2A-H2B (Fig. 1c), consistent with the evolutionary sequence 


V5-hc and V5-Ic denote the heavy and light chain, respectively, of the V5 
antibody. d, Crystal structure (2.35 A) of the tethered (~25-residue linker, no 
electron density observed, grey dotted line) complex between C. thermophilum 
Spt16M (residues 647-950, green and blue), histone H2A (13-106, yellow) and 
histone H2B (24-122, red). H2A® and H2B® denote the globular domains. 
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conservation of FACT (Supplementary Fig. 1). This identifies SptloM 
as a conserved binding module for H2A—H2B. 

To characterize how Sptl6M engages H2A-H2B, we determined 
the structures of free Chaetomium thermophilum Spt16M (2.0 A reso- 
lution; Supplementary Fig. 2 and Supplementary Table 1) and a teth- 
ered complex with the globular H2A-H2B heterodimer (2.35 A 
resolution; Fig. 1c and Supplementary Table 2). The core of Sptl6M 
is composed of a tandem pleckstrin homology-like (PHL) module’ 
(PHL-1 and PHL-2) structurally related to the H3-H4 chaperones 
Pob3M and Rttl06 (refs 7-10, 12 and Supplementary Fig. 3). 
Crucially, only Sptl6M contains a C-terminal, «-helical U-turn motif 
that is patched onto the PHL-2 scaffold and recognizes H2A-H2B 
(Fig. 1d and Supplementary Fig. 3b). The U-turn motif is the most 
conserved and only extended hydrophobic patch on Sptl6M (Sup- 
plementary Fig. 2). It forms a groove complementary to a hydrophobic 
patch on the N-terminal «1 helix of H2B (Fig. 2a, b and Supplementary 
Fig. 2c). The conserved Sptl6M residues Leu915, Val919, Ile 920, 
Phe 931, Phe939 and Leu940 engage the H2B residues Ile36 and 
Tyr 39. Additional interactions include those with loop L1 and helix 
a2 of H2B to establish 2 ~660 A? interface with a free energy poten- 
tial of —7.1kcalmol”*. Comparison of free (C. thermophilum and 
Saccharomyces cerevisiae’) and histone-bound Sptl6M reveals few 
differences in the backbone of either chaperone or histones (Sup- 
plementary Fig. 4), suggesting rigid docking. Isothermal titration 
calorimetry (ITC) reveals endothermic binding with a ~400 nM dis- 
sociation constant (Kg) and 1:1 stoichiometry (Supplementary Fig. 5), 
consistent with the observed hydrophobic contacts between Sptl6M 
and H2B. 

To validate the interactions, we used biochemical, thermodynamic, 
site-directed mutagenesis and in vivo assays. Pull-down assays show 
that a construct containing the U-turn motif and PHL-2 is sufficient to 
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Figure 2 | A conserved, hydrophobic groove in the U-turn motif of Sptl6M 
interacts with a hydrophobic patch of H2B. a, Close-up view of the Sptl16M- 
H2A-—H2B interface. Side chains of the H2B «1 helix (H2Ba1; red) nestle into 
the hydrophobic groove formed by the Sptl16M (green) U-turn motif (marine). 
b, Primary sequence of residues in the U-turn motif. Hydrophobic residues 
(green) tend to be conserved (pink). ¢, d, Wild-type (WT) Sptl6M but not an 
engineered U-turn mutant forms a complex with full-length H2A-H2B in gel- 
filtration experiments (c) and ITC (d). e, ITC between wild-type Sptl6M and 
various histone constructs; ITC profiles and fitting data are given in 
Supplementary Fig. 5. Data are mean = s.d. 
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recognize both full-length and tailless H2A-H2B (Supplementary 
Fig. 6a). Constructs consisting solely of the U-turn, or PHL-1/PHL-2 
module, aggregate during purification, consistent with the hydro- 
phobic core shared between PHL-2 and the U-turn. Wild-type 
Sptl6M forms a complex with H2A—H2B in size-exclusion chromato- 
graphy (SEC) that is consistent with 1:1:1 stoichiometry (Fig. 2c). By 
contrast, the Sptl6M U-turn mutant Asn916Ser/Val919Ser/Tle920Ser/ 
Thr923Ser (Sptl6MY") fails to form a complex with full-length his- 
tones H2A—H2B by SEC and ITC (Fig. 2c, d), although its structure is 
preserved (Supplementary Fig. 6b). 

On the histones’ side, mutation of the hydrophobic H2B «1 helix 
residue Ile 36 reduces affinity 30-fold (Fig. 2e and Supplementary 
Fig. 5). By contrast, mutation of two other prominent hydrophobic 
surfaces on the H2A-H2B heterodimer, the C-terminal H2A region 
and Tyr 80 in helix «2 of H2B, does not alter the Spt16M interaction. In 
agreement, a H2B peptide spanning the H2B «1 helix (residues 26-48) 
binds Sptl6M with low micromolar affinity. Together, these assays 
validate the hydrophobic, globular interface established by the U-turn 
and H2B «1 helix as a primary interaction region between the chaper- 
one FACT and H2A-H2B. 
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Figure 3 | Multiple interactions support histone binding by FACT, but 
Spt16M-mediated contacts are key to chaperoning function. a, V5-Sptl6M 
pulls down an H2B N-terminal tail peptide encompassing residues 11-30, but 
not residues 1-20. Mutation of a conserved acidic patch on PHL-2 disrupts 
binding. b, ITC of various (truncated) Spt16 constructs shows that domains 
other than Spt16M contribute exothermically to the overall interaction with 
H2A-H2B.c, , Chaperoning assay. Pre-incubation of H2A—H2B with full-length 
Sptl6 (Spt16") or Spt16M, but not Spt16AM, prevents histone-driven 
precipitation of DNA and rescues the soluble H2A~-H2B-DNA complex (left). 
Quantification of the H2ZA-H2B-DNA complex (lanes 7, 12, 17 and 22) was 
carried out in quadruplicate (right). AU, arbitrary units. d, Wild-type Spt16 
rescues a Aspt16 strain in vivo, whereas U-turn or acidic patch mutants mostly 
cannot. Protein expression was verified by western blot against an N-terminal 
V5-tag. FOA, 5-fluoroorotic acid. e, Streptavidin-mediated pull down of 
biotinylated H3 peptides. 
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Electrostatic interactions, often involving the basic histone tails, 
support histone-chaperone interactions. For Sptl6M and H2A- 
H2B, the equilibrium dissociation constants are similar in the presence 
and absence of histone tails (Fig. 2e). However, deletion of the H2B 
N-terminal tail disrupts the chaperone-histone complex in SEC and 
accelerates disassembly of the complex (Supplementary Fig. 7a, b). 
Furthermore, a peptide encompassing H2B residues 11-30, but not 
1-20, directly binds the chaperone (Fig. 3a). Interestingly, the 
Sptl6M-H2A-H2B structure reveals an electrostatic crystal contact 
(450A’, free energy potential of +1.2kcalmol ') mediated by 
Glu 899, Asp902 and Asp905 on PHL-2 and H2A Arg residues 
(Supplementary Fig. 7c), which could be replaced by positively charged 
residues of the H2B tail. Consistently, mutation of the acidic patch 
(Sptl6?s?; Asp902Ala, Ser903Ala Asp905Ala), but not mutation of 
the U-turn, abolishes interaction with H2B (11-30) (Fig. 3a) and low- 
ers the Kg for full-length H2A—-H2B ~4-fold (Supplementary Fig. 7d). 
Our data indicate that the H2B tail mediates the kinetic stability of 
the complex rather than determining its equilibrium stability, which 
depends on the interactions between the globular cores of H2A—-H2B 
and the Sptl6M U-turn. 

Deletion of the C-terminal region of human Sptl6 (termed 
FACTAC) abrogates H2A-H2B binding, chaperone activity and cel- 
lular viability”. In light of our structure, it is clear that in addition to 
the acidic C-terminal tail of Sptl6 (termed Spt16C; ref. 12), FACTAC 
lacks the entire and essential U-turn motif and most of PHL-2 (Fig. 1a). 
To refine the contribution of Sptl6M to histone binding and chaper- 
one function further, we compared H2A-H2B binding by full-length 
Sptl6 (Spt16") with truncated constructs using ITC. Both Sptl6M and 
Sptl6M plus acidic C terminus (Sptl6MC) display an endothermic 
binding site (Kg ~ 400 nM). However, Sptl6MC adds a second, exo- 
thermic binding site (Ky ~ 30 nM; Fig. 3b), consistent with an inde- 
pendent, electrostatic histone interaction site mediated by Sptl6C. 
These values compare favourably with the 30-90 nM H2A—H2B affi- 
nity reported for holo-FACT and full-length Spt16 using independent 
methods'*. Furthermore, Sptl6N and Sptl6D together (Sptl6ND) 
bind H2A-H2B exothermically, albeit with low affinity (Ka = 10- 
100M). ITC profiles of full-length Sptl6 and of Sptl6 lacking 
Sptl6M (Sptl6AM) combine the characteristics of the isolated Sptl6M, 
Sptl6MC and Sptl6ND domains. Thus, quantitative ITC reveals two 
high-affinity sites: the hydrophobic interaction seen in our Sptl6- 
H2A-H2B complex, and an electrostatic Spt16C interaction. 

Crucially, whereas full-length Sptl6 prevents histone-DNA aggre- 
gates, a construct lacking Sptl6M but containing the high-affinity, 
electrostatic Sptl6C site (Sptl6AM) cannot (Fig. 3c). By contrast, 
Sptl6M alone resolves aggregates (Fig. 3c), indicating that the inter- 
action of Sptl6M with the globular H2A-H2B core is essential to 
chaperone function. 

To test the role of key residues in vivo, we rescued the lethality of a 
yeast spt16 deletion strain with mutant Sptl6 proteins. Mutation of 
U-turn or acidic patch residues does not reduce the in vivo stability of 
Spt16, but mostly fails to rescue viability (Fig. 3d). Deletion of Spt16C is 
also lethal. However, because Spt16C contains a putative nuclear local- 
ization signal required for nuclear localization (Supplementary Fig. 8), 
the lethality cannot be directly attributed to a deficient nuclear function. 

In addition to binding H2A-H2B, FACT recognizes H3-H4 (ref. 2). 
Because the tandem PHL core of Spt16M is structurally related to the 
H3-H4 chaperones Pob3M (ref. 12) and Rtt106 (refs 7, 8, 10), we tested 
H3-H4 binding and find that Sptl6M binds both full-length and tail- 
less H3-H4 (Supplementary Fig. 9a, b). Similarly, S. cerevisiae Sptl6M 
binds H3-H4 with 2.541M affinity’. Importantly, U-turn mutants 
retain the H3-H4 interaction (Supplementary Fig. 10), suggesting that 
H3-H4 and H2A-H2B have distinct binding interfaces on Sptl6M. 

The interaction between Sptl6M and H3-H4 probably occurs 
through a region encompassing histone H3 residues 46-65 (Fig. 3e), 
which is also recognized by Rtt106, preferentially in Lys 56-acetylated 
form’*. Sptl6M binding of the H3(46-65) peptide is preserved after 
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Lys 56 acetylation (Supplementary Fig. 9d), although future work needs 
to clarify whether Lys 56 acetylation affects FACT function in vivo. 

Furthermore, we solved the structure of the FACT heterodimeri- 
zation domain (Fig. 4a and Supplementary Table 3). Sptl6D-Pob3N 
also consists of PHL domains, a single PHL in Sptl6D and a tandem 
PHL domain lacking the capping helix of the second domain in 
Pob3N. Interestingly, the PHL module of Sptl6D-Pob3N does not 
interact with H2A-H2B. Nor does it bind H3-H4, in contrast to the 
tandem PHL modules of Sptl6M, Pob3M and Rtt106 (Fig. 1b and 
Supplementary Fig. 9c). Yet, extended surface patches show high 
sequence conservation (Supplementary Fig. 11), suggesting a distinct 
but conserved molecular function. We used S. cerevisiae lysates expres- 
sing tandem affinity purification (TAP)-tagged proteins to screen for 
proteins co-precipitating with Sptl16D-Pob3N, and identified the large 
subunit of the DNA polymerase « complex (Poll) as a Spt16D inter- 
actor (Fig. 4b). Our assay suggests that the FACT heterodimerization 
domain couples FACT to the replication machinery, promoting 
nucleosome deposition during replication”. 

The high-resolution snapshot of the Sptl6 M-H2A-H2B complex, 
together with the structure of the FACT heterodimerization domain, 
completes the domain-by-domain dissection of FACT structure 
(Supplementary Fig. 12): Sptl6N, Sptl6M and Pob3M bind H3-H4, 
whereas Sptl6M binds H2A-H2B. Consistent with the pleiotropic 
functions of FACT, the interaction between H2B and the Sptl6M 
U-turn is unlikely to be directly affected by H2B heterodimers contain- 
ing non-canonical H2A variants (for example, H2A.X and macroH2A) 
or by post-translational modifications including ubiquitination, which 
has a role in FACT function'*”” (Supplementary Fig. 13). 

Our structures serve as a platform for investigating the mecha- 
nism(s) by which holo-FACT couples H2A-H2B recognition to 
nucleosome reorganization. This can be illustrated by a superposition 
of Sptl6M-H2A-H2B onto the nucleosome core particle (NCP) 
(Supplementary Fig. 14). We suggest that the solvent-accessible H2B 
N-terminal tail may mediate first interactions of FACT with the 
nucleosome. The Sptl6M chaperone capitalizes on the dynamic nature 
of the NCP, in particular the constant and progressive unwrapping/ 
rewrapping of DNA from the octamer core’’, to invade the NCP 
gradually and develop stronger interactions with the two DNA- 
covered binding patches on the H3 aN and H2B a1 helices. 
Shielding of a histones’ DNA-interaction site is typical for histone 
chaperones’’**. Together, these multiple contact points establish an 
extended surface that coordinates the outermost ~30 base pairs**”®. 
Consistently, this DNA becomes hypersensitive to chemical modifica- 
tion in the presence of holo-FACT”. In perfect agreement with recent 
biochemical studies of FACT-facilitated Pol II transcription through 
nucleosomes”, our structural data rationalize how FACT promotes 
nucleosome ‘breathing’*”° and stabilizes reorganized, partially disso- 
ciated, more accessible nucleosome forms”’”’, assisting the passage of 
polymerases” without NCP disassembly to ensure chromatin integrity. 
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Figure 4 | The heterodimerization domain of FACT mediates interaction 
with the DNA replication machinery. a, Cartoon representation of the 
Spt16D (green) and Pob3N (magenta) PHL domains. b, The Spt16D domain of 
the FACT complex pulls down replicative Poll from yeast whole-cell extracts, 
as detected by western blot against TAP-tagged Poll. 
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METHODS SUMMARY 


The C. thermophilum Sptl16M domain (residues 651-944) was fused to histone 
H2B (residues 24-122) by a 12-residue linker and was co-expressed with H2A 
(residues 13-106). Tetragonal crystals of the native complex (space group P432;2) 
were grown at 4°C or 10°C from hanging drops. High-resolution data sets were 
collected at beamlines PXIII (Swiss Light Source, Villigen, Switzerland) and ID23- 
2 (European Synchrotron Radiation Facility, Grenoble, France). ITC was per- 
formed at 20°C in 200 mM NaCl, 25 mM Tris, pH7.5. 


Full Methods and any associated references are available in the online version of 
the paper. 
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METHODS 


Protein expression and purification. The C. thermophilum Sptl6M domain 
(residues 651-944) was cloned into pETMCN-6xHis, carrying an N-terminal 
6XHis tag and tobacco etch virus (TEV) protease cleavage site (leaving an 
N-terminal overhang of the residues Gly-Met-Glu, in which Glu corresponds to 
residue 647 of Spt16M; clone CL2537). For expression of the complex, the Sptl6M 
construct was fused to a 12-residue GGSGGSGGSGGS linker and the globular 
domain of H2B (residues 24-122). The construct (clone CL2807) was co- 
expressed with globular H2A lacking the hydrophobic C terminus (residues 13- 
106). C. thermophilum Pob3N (residues 1-192) was cloned into pETMCN-6xHis 
(ampicillin selection), carrying an N-terminal 6 x His tag and TEV protease cleav- 
age site (leaving an N-terminal overhang of the residues Gly-Met-Glu (clone 
CL2060) and coexpressed with an untagged version of Sptl6D (residues 521- 
651; clone CL2558) under kanamycin selection. 

Constructs were transformed and grown in Escherichia coli BL21- 
CodonPLUS(DE3)-RIL cells to an attenuance (D) of 0.7 nm and induced with 
0.4 mM isopropyl B-b-thiogalactoside (IPTG) in rich medium at 18 °C for 16h. 
Selenomethionine-labelled protein was expressed in strain B834 (DE3) and 
induced for 18h with 0.5 mM IPTG in TB media with 40 gml* seleno-1- 
methionine at 18°C. Cells were resuspended in 50mM Tris, pH7.5, 500 mM 
NaCl, 10mM imidazole, and EDTA-free protease inhibitor cocktail (Roche 
Complete), lysed by sonication, and centrifuged at 45,000g for 60 min. The super- 
natant was loaded onto a column packed with Ni-sepharose high performance 
beads (GE Healthcare), washed with lysis buffer, and eluted in the same buffer with 
a linear gradient of imidazole from 0 to 500 mM. Elutions were dialysed overnight 
in a buffer containing 25 mM Tris, pH 7.5, 400 mM NaCl and 5 mM dithiothreitol 
(DTT) and subsequently concentrated to 10 mg ml using a Vivaspin 15R 10,000 
molecular mass cut-off concentrator. The protein was then further purified on a 
Superdex 75 HR16/60 (for the chaperone-histone complex: SD 200 HR16/60) 
column (GE Healthcare). Fractions were pooled and the 6X His tag was cleaved 
with TEV protease for 20h at 4°C and dialysed into a buffer containing 25 mM 
Tris, pH 8.0, 150 mM NaCl and 5 mM DTT (chaperone-histone complex: 25 mM 
HEPES, pH 8.5, 500 mM NaCl, 2mM DTT). The protein was bound to a MonoQ 
HR5/5 (heterodimerization domain: MonoS HR5/5, chaperone-histone complex: 
MonoS HR10/10) ion exchange column (GE Healthcare) and eluted running a 
linear gradient of 50 column volumes of elution buffer containing 25 mM Tris, 
pH8.0, 1M NaCl and 5mM DTT. Fractions were pooled and dialysed against 
25mM Tris, pH 8.0, 150 mM NaCl and 5mM DTT (complex: 25 mM HEPES, 
pH8.5, 500mM NaCl, 1mM Tris(2-carboxyethyl)phosphine (TCEP)). Site- 
specific mutations were introduced by PCR and purified like wild-type Sptl6M. 
Recombinant histones were purified and refolded, as described*'. 
Crystallization and data collection. Orthorhombic crystals belonging to space 
group P2;2;2; of selenomethionine-labelled and native Sptl6M (form A; 
Supplementary Table 1) were grown at room temperature from hanging drops 
composed of 1 pl of protein (3 mg ml — ’) and 1 pl of crystallization buffer (6% (v/v) 
PEG 8000, 100mM Na-cacodylate, pH5.5, 200mM Ca-acetate hydrate) sus- 
pended over 0.5 ml of the latter. Crystals were transferred in 100% parathon N 
and frozen in liquid nitrogen. Single-wavelength anomalous dispersion data were 
collected at beamline PXII (Swiss Light Source (SLS), Villigen, Switzerland). A 
higher-resolution native data set was acquired at beamline ID-23-1 (European 
Synchrotron Radiation Facility (ESRF), Grenoble, France). Data processing and 
scaling were done with XDS***. Tetragonal crystals of the native complex (space 
group P4,2,2, Supplementary Table 2) were grown at 4°C or 10 °C from hanging 
drops composed of 1 pl protein (15 mg ml ') and 1 pl crystallization buffer (7.25% 
(v/v) PEG 8000, 0.2 M MgCl, 0.1 M Tris, pH 7.8) suspended over 1 ml of the latter. 
Crystals were frozen in glycerol, stepwise soaking up to 20% in crystallization 
buffer, and frozen in liquid nitrogen. High-resolution data sets were collected at 
beamlines PXIII (SLS, Villigen, Switzerland) and ID23-2 (ESRF, Grenoble, 
France). Data processing and scaling were done with XDS and Scala****”. 
Pob3N-Sptl6D crystals grew in space group P2;2;2, (Supplementary Table 3) 
using the same set up as above in 2.2 M NH,SOy,, 0.2 M Na-K-tartrate and 0.2 M 
Na3-citrate, pH 5.6. Crystals of Pob3N-Spt16D were cryoprotected in crystalliza- 
tion buffer supplemented with 20% ethylene-glycol. Single-wavelength anomalous 
dispersion data were collected at beamline PX02 (SLS, Villigen, Switzerland). A 
higher-resolution native data set was acquired at beamline ID-23-eh1 (ESRF, 
Grenoble, France). Data processing and scaling were done with XDS. 

Structure determination and refinement. For Sptl6M and Sptl6D-Pob3N, 
single-wavelength anomalous dispersion data were used to locate six selenium 
sites with Phenix Auto Solve’ that further carried out site refinement, phasing, 
density modification and phase extension. Secondary structure elements were 
identified and an initial model was built using Arp/Warp*”**. The structure was 
completed in alternating cycles of model correction in COOT and restrained 
refinement in Refmac5 (refs 35, 37). The model was further used to determine 
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the structure of the native data set by molecular replacement with PHASER™. For 
the structure of the complex, a PHASER molecular replacement solution was 
determined using the Sptl6M structure determined here and the histone H2A- 
H2B heterodimer from the structure of the canonical nucleosome core particle”. 
The structure was finalized by iterative cycles of model adjustment in COOT and 
refinement in Refmac5 and PHENIX”®. Structural visualization was done using 
Pymol. Electrostatic surface potentials were calculated using APBS”. Structural 
superpositions were calculated with 3dSS (ref. 41). 

ITC. Binding affinities of wild-type Sptl6M with H2A peptide, residues 108-130 
(N-acetylated, with a C-terminal Tyr) and H2B peptides, residues 26-48 (N- 
acetylated, C-amidated), were determined at 25°C by using VP-ITC and 
iTC200 calorimeters (GE Life Science, MicroCal). For peptide—protein interaction 
studies, proteins and peptides were dialysed against ITC buffer (25 mM Tris, 
pH7.5, 50mM NaCl). Injections consisted of 10,1 of peptide (600 1M) into 
20 uM protein at 5-min intervals at 25 °C. For protein-protein interaction studies 
of Sptl6M with constructs of histones H2ZA—-H2B, proteins were dialysed against 
ITC buffer (25 mM Tris, pH 7.5, 200 mM NaC)). Injections on the VP-ITC instru- 
ment consisted of 10 ul of Sptl6M (325M) into 20}44M H2A-H2B dimer at 
5-min intervals at 25 °C and of 1 ul injections of 250 uM chaperone into 25 11M 
H2A-H2B on the iTC200. Data were analysed using Origin software (version 5.0). 
A single binding site model for Sptl6M gave the best fit to the data, whereas 
Sptl6MC had to be fitted with two independent binding sites. Errors are given 
as s.d. of the fit from the original data points. 

Histone refolding and gel filtration. Histone refolding was performed as 
described*', with modifications: full-length and globular histones were mixed at 
equimolar ratios to a final concentration of 1 mg ml~ 1 and refolded in 25 mM Tris, 
pH7.5, 150mM NaCl and 5mM DTT. H2A-H2B dimers as well as (H3-H4). 
tetramers were subsequently purified by gel-filtration chromatography using a 
Superdex75 HR16/60 column (GE Healthcare). Histones and Spt16M were mixed 
at equimolar ratios and incubated on ice for 30 min. Proteins were separated on a 
Superdex 75 or Superdex 200 10/300 GL column at 25 mM Tris, pH 7.5, 300 mM 
NaCl and 2mM DTT. 

Native PAGE analysis of Sptl16 chaperoning function. Sptle" and Sptl6AM 
were expressed and purified as Sptl6M. A 54-base-pair DNA fragment was syn- 
thesized as two complementary oligomers, which were then annealed. The ratio of 
H2A-H2B to DNA that caused close to complete precipitation was determined 
experimentally at a ratio of three molar equivalents of histone dimer to DNA. 
Histone dimer (1.2 uM) was preincubated with 0.4, 0.8, 1.6, 3.2 and 6.41M of 
Sptl6", Sptl16AM and Sptl6M in 10mM Tris-HCl, pH7.4, 100mM NaCl and 
1mM DTT. Binding of chaperone to histone was allowed to proceed at 25 °C for 
15 min before the addition of DNA to a final concentration of 0.4 UM in a total 
reaction volume of 20 pl. In addition, controls containing chaperone at the con- 
centration corresponding to the highest titration point with DNA alone were also 
carried out. Precipitation was carried out at 25 °C for 1 h before the addition of 5 ul 
of 20% (w/v) sucrose, removal of precipitates by centrifugation and separation of 
the remaining soluble complexes on a 9% polyacrylamide gel run in 0.2 TBE 
buffer. The gels were stained with ethidium bromide before visualization and 
quantification using a Fusion-FX7 Advance (PeqLab) imaging system. Statistics 
were calculated on a quadruplicate repeat of the experiment, with a two-tailed 
t-test assuming equal variance. Asterisks indicate P values of less than 0.05 when 
compared to the control without chaperone. 

V5 immunoprecipitations. A total of 15 jl of anti- V5-agarose beads (Sigma) was 
incubated with 40g of E. coli-expressed, gel-filtration- and ion-exchange- 
purified V5-fused Spt16 or Pob3 construct for 30 min rotating at 4°C in 25 mM 
Tris, pH 7.5, 150 mM NaCl and 0.05% Nonidet P-40 detergent. Beads were washed 
three times with 1 ml buffer. For interaction with histones, beads were incubated 
with refolded H2A-H2B in fivefold excess of histone for 1h at 4°C. Beads were 
washed five times with 25mM Tris, pH 7.5, 200 mM NaCl and 0.05% Nonidet 
P-40. The samples were either directly boiled in SDS-loading buffer or eluted for 
30 min with 25 pl V5 peptide (2 mg ml‘) (sequence, Ac-YGKPIPNPLLGLDST) 
at room temperature. Samples were subsequently analysed by SDSPAGE. For 
interaction with H2B or H3 peptides, beads were washed twice with 25 mM 
Tris, pH.7.5, 600mM NaCl and 0.05% Nonidet P-40 and twice with 25mM 
Tris, pH7.5, 0.05% Nonidet P-40 and 75 (H2B) or 150 (H3) mM NaCl. Two 
microlitres of 10mg ml peptide were incubated with the beads in 300 pl of 
the respective buffer for 2 h at 4 °C. Beads were washed four times with 1 ml buffer 
and bound peptides eluted twice with 10 tl 25 mM Tris, pH7.5, 1M NaCl and 
0.05% Nonidet P-40. Samples were analysed by SDS-PAGE (NuPAGE BisTris 
4-12%, run only for 75% of the length) and silver stain (Invitrogen SilverQuest kit). 
Biotin-streptavidin immunoprecipitations. A total of 25,1 of streptavidin 
dynabeads (T1, Invitrogen) was saturated with 20 ul of 10 mg ml’ H3 peptides 
for 1h rotating at 4 °C in 25 mM Tris, pH7.5, 150 mM NaCl and 0.05% Nonidet 
P-40 detergent. Beads were washed three times with 1 ml buffer. Recombinant 
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Sptl6M was incubated with the beads for 2 h rotating at 4 °C. Beads were washed 
five times with 1 ml buffer, bound protein was eluted by boiling with Laemmli SDS 
loading buffer and analysed by SDS-PAGE and Coomassie staining. 
Phenotypic analyses in S. cerevisiae. To determine the effect of Sptl6M muta- 
tions on yeast cell growth”, Spt16 was deleted from S. cerevisiae strain W303 by 
homologous recombination introducing a TRP cassette as selection marker. The 
associated lethal phenotype was rescued using a plasmid (YCplac33) carrying 
wild-type Spt16 from S. cerevisiae (clone CL2303) as well as the URA3 gene that 
was co-transformed using the lithium acetate/PEG method. Spt16 from C. ther- 
mophilum (wild-type and mutants thereof; clones CL2924 (wild type), CL3046 
(NVIT—A), CL3002 (NVIT-—S), CL3001 (DFL—S), CL2978 (QD—A) and 
CL2977 (DSD—A)) with an N-terminal V5-tag was cloned into YCplac111 car- 
rying the LEU2 gene. The Aspt16 strain with the URA rescue plasmid was trans- 
formed with the mutant constructs under Leu selection und further on submitted 
to 5-fluoroorotic acid (FOA) selection. Thus, mutants depending on the presence 
of wild-type Spt16 cannot grow on FOA plates. Transformants growing on select- 
ive synthetic medium (SD — Leu) plates were grown for 5h in YPAD medium and 
subsequently plated by spotting 4 il of tenfold serial dilutions onto —Leu FOA 
plates and incubated at 24 °C for 4 days. 
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A problem shared 


Graduate students often work alone, but programmes exist 
to teach them how to work towards publication in teams. 


BY CAMERON WALKER 


t 4 p.m. one spring Tuesday, nine 
A graduate students at Oregon State Uni- 
versity (OSU) in Corvallis kick their 
research into gear. For more than an hour, they 
discuss their work on fisheries along the US 
west coast, moving from how data on marine 
biodiversity hotspots change over time to how 
to organize tables for a paper. 
These students, who have been meeting 
frequently for nearly two years, are part of 
the Dimensions of Biodiversity Distributed 


Graduate Seminar (DBDGS), a programme 
that ultimately included teams at 14 institu- 
tions — nine in the United States, two in Kenya 
and one each in Chile, Brazil and China. Each 
institution's group runs at least one research 
project, in which students, usually work- 
ing outside their main field, make intensive 
examinations of large, mostly pre-existing 
biodiversity data sets. They formulate research 
questions, analyse data and publish the results 
— sometimes in multiple papers, and usually 
with every student in the group asa co-author. 
They use the DBDGS network to talk about 
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their work with students at other institutions, 
share strategies and form collaborations. 

Whereas most established faculty members 
spend ample time collaborating with col- 
leagues, graduate students are often required 
to toil alone, notes Julia Parrish, a biologist 
at the University of Washington in Seattle 
and principal investigator on the three-year, 
US$1.5-million US National Science Founda- 
tion (NSF) grant that funded the DBDGS. But 
that means “training people to be last century’s 
scientists’, she says. By contrast, collaborative 
seminars teach students how to work together 
in a way that is reflective of the contemporary 
research world. 

The distributed-seminar model originated at 
the US National Center for Ecological Analysis 
and Synthesis (NCEAS), a research centre of 
the University of California, Santa Barbara 
(UCSB), where, in 1997, graduate students 
from eight institutions first came together to 
analyse the science of different habitat-conser- 
vation plans and create a report. The DBDGS, 
which launched in 2011 with a pilot project at 
the University of Washington and is funded 
until the end of this year, is based on that tem- 
plate. Teams at each institution sent student 
and faculty representatives to five in-person 
meetings in Washington state throughout the 
programme — the largest, in February 2012, 
brought together close to 60 participants from 
14 institutions. Team members learned about 
each other’s work, shared the data that they 
would use, discussed how they would approach 
their projects and made connections for future 
collaborations. Student teams are now working 
on almost 30 papers; two have been published. 

Each team functions as an independent 
unit, with between 5 and 15 students working 
together and one or more faculty members 
offering guidance. Most of the students have 
an interest in biodiversity, but do not neces- 
sarily possess any training in the field that the 
project covers. At OSU, for example, students 
with backgrounds ranging from stream ecol- 
ogy to geosciences worked on fisheries data. 


LONELY PURSUIT 

The road to publication can be arduous, and 
programmes that bring graduate students 
together may help them to avoid pitfalls and 
stay competitive. The seminars teach PhD stu- 
dents how to find out what gaps exist in their 
fields, what methods they could use to analyse 
their own data, what data sets are already avail- 
able to work with, how to structure their writ- 
ing and which journals would be receptive > 
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> toa particular kind of paper. 

Graduate students on the DBDGS team at 
the Virginia Institute of Marine Science (VIMS) 
in Gloucester Point, part of the College of Wil- 
liam and Mary, read research on fish diversity 
before diving into records of trawling in nearby 
Chesapeake Bay and measuring specimens in 
the collection of the Smithsonian Institution 
National Museum of Natural History in Wash- 
ington DC. That showed them what ground 
had already been covered on the subject, says 
Jonathan Lefcheck, a graduate student in 
marine community ecology who led the team, 
and demonstrated “where the gaps were, so we 
set ourselves up to have a marketable product”. 
The researchers also learned analysis methods 
and found appropriate citations in the litera- 
ture, which they intend to use in a paper. 

By looking at how previous studies were con- 
ducted, students could begin to consider how 
to analyse their own data sets. In some cases, 
DBDGS students already had experience with 
evaluating data, and could split up the work- 
load according to their strengths: one student 
sequencing fish genes, another doing statistics. 

The combined effort made it possible to 
tackle data sets that would overwhelm a single 
student, or even an individual lab. In the course 
of a year, the team at UCSB compiled global 
fisheries data from eight or nine sources — a 
project that few, if any, of the students would 
have pursued for their own dissertations, says 
Laura Dee, a graduate student in conservation 
and marine ecology at UCSB who led one of 
the teams. 

Lefcheck taught himself new methods, from 
multivariate statistics to building evolutionary 
trees, which he would 
not have attempted 
without the backing " 
of the group. “If it 
became too much of 
a burden, I could ask 
others to pitch in,” he 
says. Going beyond 
his own disserta- 
tion work, Lefcheck 
read primary litera- 
ture and contacted 
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authors as well as “Really open 

students and faculty conversations 

members inthewider and 

DBDGS network.He transparency 

isnowapplyingmany fromthe 

of these methods to beginning are 

his dissertation work. the way to go.” 
For many students, — Ailene Ettinger 


the writing process 

itselfis daunting. Some DBDGS students went 
on programme-sponsored writing retreats to 
help them tackle the process. At one, students 
and faculty members on the University of 
Washington team collaborated on a draft of a 
paper abstract and introduction — one person 
wrote for two minutes, then passed the piece 
of paper to the next person. “It helped us relax 
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a little bit, and just get into the writing mode,’ 
says Ailene Ettinger, a biology graduate student 
who co-led the team. 

The OSU team broke into two groups, each 
working on a paper using the same data. One 
useda traditional approach, dividing the paper 
into sections for one or two students to tackle, 
with the team leader pulling them together. Stu- 
dents in the other group went to their retreat 
with sections that they had already written sep- 
arately. They assembled the paper as a group, 
using a projector to put it up on the wall and 
edit it line by line. “Having everyone in one 
place may have actually sped things up a bit,” 
says Selina Heppell, a marine fisheries ecologist 
at OSU and the team’s faculty adviser. Institu- 
tions may also offer one-off presentations from 
editors of scientific journals, who can provide 
insight into the publishing process. 


WORKING TOGETHER 

Throughout the DBDGS programme, students 
constantly practised how to interact with other 
researchers — whether explaining their work 
or negotiating how to distribute tasks — while 
keeping everyone on track to publication. 

Determining who gets credit for what when it 
comes to authorship is often difficult (see Nature 
489, 591-593; 2012). During the DBDGS, for 
example, each group had to come to a consen- 
sus about what to do if people dropped out. The 
OSU team decided that anyone who stuck with 
and contributed to the project for the entire 
first year would be an author. Even though the 
students split into two groups, Heppell says, all 
have contributed something substantial to the 
work as a whole — and each of the 14 students 
will be on the seminar’s two papers. 

The University of Washington group decided 
to wait until it was most of the way through the 
data analysis before it chose a lead author. In the 
end, two students were designated joint leads, 
and flipped a coin to see whose name would 
be listed first. “Really open conversations and 
transparency from the beginning are the way to 
go, so that everyone knows what the expecta- 
tions are,’ says Ettinger. 

Cross-disciplinary collaborations have 
emerged between institutional teams. Parrish 
recalls that when students from UCSB and 
OSU first met, they were unsure of each other. 
The Oregon team was looking at what species 
came up in individual trawls in a single stretch 
of the Pacific Ocean; the Santa Barbara students 
were poring over global fisheries data from a 
conservation angle. As they discussed their 
projects, however, they realized that combin- 
ing the scales could yield interesting results. 
Now students from the teams are at work on 
an independent project — a collaboration that 
Kate Boersma, a graduate student in stream 
ecology at OSU, calls the most rewarding aspect 
of the programme. It has influenced her dis- 
sertation research, and she hopes that its effects 
will continue after she graduates. 

Parrish says that some of the international 
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groups have struggled because of funding, 
travel or infrastructure difficulties at home. 
However, the team at the Federal University 
of Rio Grande do Sul in Porto Alegre, Brazil, 
has thrived: taking advantage of their strong 
backgrounds in statistics and mathematics, 
its students have published a paper. 

Graduate student Vanessa Weinberger, a 
team leader at the Pontifical Catholic Uni- 
versity of Chile in Santiago, says that she has 
benefited from working with a large team of 
people who do not share her background in 
theoretical ecology, and from the opportu- 
nity to present research in English. In Chile, 
she says, it is relatively unusual for graduate 
students from different laboratories — let 
alone different universities — to collaborate 
on a project outside their theses. 


BRANCHING OUT 
The DBDGS is set to end this year, but George 
Gilchrist, an NSF programme director, says 
that its good results have prompted the foun- 
dation to consider repeating the exercise. The 
NCEAS would also be open to hosting further 
distributed seminars, says deputy director 
Stephanie Hampton. The centre is currently 
hosting its first three-week institute for early- 
career researchers, which had more than 
400 applicants from around the world. The 
22 successful participants, including several 
graduate students, are learning skills for col- 
laborative, data-intensive ecological research. 
The programme runs from 19 June to 10 July, 
and the NCEAS plans to offer it each year. 
With a colleague, Helene Wagner, a land- 
scape ecologist at the University of Toronto 
Mississauga in Canada, has led two land- 
scape-genetics courses based on the distrib- 
uted-seminar model. Students could opt to 
participate in group projects using existing 
data sets and simulation studies. The first, 
conducted in person and online in 2010, 
resulted in five papers. Unlike the mostly 
single-institution DBDGS groups, each 
team was made up ofa mix of students from 
the 15 participating universities in North 


A seminar team at Oregon State University works through the trials of collaboration. 


America and Europe, including the Swiss 
Federal Institute of Technology in Zurich 
and Joseph Fourier University in Grenoble, 
France. The second course took place entirely 
online, mitigating student travel expenses; 
even students who were not at a participating 
institution could sign up. A similar course is 
planned for 2014. 

In the United Kingdom, Vitae, a Cam- 
bridge-based organization that focuses on 
researchers’ professional development, has 
offered publishing workshops with Mac- 
millan Science Communication in Lon- 
don (which has the same parent company 
as Nature). It also runs a course called The 
Collaborative Researcher, which brings 
together 40 researchers at a time to learn 
skills including communication, cultural 
awareness, planning and negotiation. 


SIDE PROJECTS 

Seminars such as these take up precious time, 
which is often in short supply for graduate 
students. Dee extols the skills and collabora- 
tions that she has gained from the DBDGS 
— but says that the process took longer 
than anyone expected. She estimates that it 
delayed her thesis by four months. But her 
CV, she hopes, will boast papers from both 
the seminar and her cross-seminar collabo- 
rations — and some of the work that she is 
doing with OSU students will be part of her 
dissertation. 

Heppell told students in her team — par- 
ticularly those who signed up to tackle big 
workloads, such as data analysis — that they 
needed to have a serious talk with their advis- 
ers about how much time it would take. She 
has not heard any complaints; in fact, advisers 
have commented that if students get publica- 
tions, the DBDGS is a good use of their time. 
“T think advisers realize that the stakes are 
higher now,’ she says, referring to research 
funding challenges. “The jobs are fewer. m 


Cameron Walker is a freelance writer based 
in Santa Barbara, California. 
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COLLABORATION 
US and UK join forces 


The United States and Britain are 
launching a global research collaboration 
to address issues such as water supply and 
climate change in emerging nations. The 
5-year programme will fund up to a total of 
40 US and UK grants per year, says Richard 
Everitt, deputy director of the British 
Council USA in Washington DC. He did 
not disclose amounts but said that awards 
could last for up to three years. Funding 
will come from the British Council, the UK 
Department for Business, Innovation and 
Skills and the US Department of State. “We 
want to create a cadre of young researchers 
who can work with their counterparts 
from the emerging world,’ says Everitt. 

The initiative will form partnerships with 
universities in nations such as China, 
Brazil, India and Indonesia, and will seek 
grant proposals in September. 


RESEARCH IMPACT 
Bang not based on buck 


Grant size does not strongly predict 
scientific impact, according to a study 
published in PLOS ONE. The authors used. 
four measures — publications, citations, 
highly cited papers and citations of the 
most-highly cited paper — to score the 
impact of 374 researchers funded between 
2002 and 2006 by the Natural Sciences and 
Engineering Research Council of Canada. 
Grant sizes explained less than 30% of the 
variation (J.-M. Fortin and D. J. Currie 
PLOS ONE 8, e65263; 2013). Co-author 
David Currie, a biologist at the University 
of Ottawa, says, “Some very poorly funded 
people manage to do a great deal” 


UNITED KINGDOM 


Funding freeze critiqued 


A UK science-advocacy group says that 

a repeated freeze to the government's 
£4.6-billion (US$7-billion) science- 
research budget, announced on 26 June, 
will damage early-career researchers’ work 
and drive them to other nations. Science is 
Vital, formed to track the results of a 2010 
budget freeze, polled 868 UK researchers, 
and found that 70% of junior scientists 
have lost confidence in research careers 

in Britain. Some 59% of respondents 
applying for grants said their success rate 
had fallen; 39% of those with labs have 
recruited fewer PhD students and 19% 
could not recruit any. “Frustrated young 
researchers are leaving,” says Jennifer 
Rohn, chair of Science is Vital and a cell 
biologist at University College London. 
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THE OSTRACONS OF EUROPA 


BY KEN HINCKLEY 


here was something transcendent 
ik the pattern etched into the 

ice-bound Europan surface looming 
53 kilometres above Ricardo Cuerta’s sub- 
mersible. The implacable gravity of Jupiter 
rewrote the great frozen palimpsest 
again and again, the pack ice 
heaved and rilled with fissures that 
hinted at the mysteries of the deep. 

That’s how he‘d seen it from 
orbit. Now the intense blue-white 
glare of the spotlights seemed to be 
all that prevented the eternal mid- 
night of the subsurface ocean from 
imploding his mind. 

Particulates clouded the super- 
cooled brine. Flurries of malformed 
magnesium sulphate flakes tumbled 
through the cones of light cast by 
the submersible and vanished again 
into the darkness. Ricardo floated, 
with nothing but the spotlights of 
the submersible and the sheer thrall of won- 
der between himself and the abyss. Even now, 
submerged within the shattered moon, he still 
couldn't fathom what that pattern meant. 

The black chimneys of a cryovolcano 
rose out of the gloom like a city of diseased 
skyscrapers. Ricardo torqued the joystick 
between his thumb and forefinger, apply- 
ing just enough pressure to manoeuvre the 
perspex tube at the end of the armature a 
little closer. He needed a sample, had to 
bring back proof — if not for the cold gaze of 
Science, then at least to convince himself that 
he wasn't confabulating wonders in the dark. 

Cold sweat drenched the polypro fabric 
clinging to his chest. The tang of constant 
anxiety oiled the fatigue lines etched into his 
face. The slightest mistake, the tiniest unin- 
tended twitch ofa muscle, and he could easily 
break a chimney and bring the entire totter- 
ing structure down on the submersible. If he 
were lucky it would breach the observation 
bell and he would be dead a few tenths of a 
second later. If he were not so fortunate, it 
would cripple the craft, leaving him drifting 
and helpless in the dark. Communication 
with the rest of the crew awaiting his return 
at the surface was impossible. There would be 
no final cry for help; he would never be found. 

Ricardo licked 
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A measure of life. 


the submersible was drifting at this very 
moment. He could have been floating in 
himself. It had been ten years since Rosa had 
died. His wife, his bride, so young. Why did 
he have to travel so far from home to exile 
himself from his own darkness? 

And yet here he was, floating in the abyss. 


The spotlights fell upon a brilliant white 
chevron in the silt-shrouded murk. At first 
he thought it was enormous — there was 
no sense of scale, nothing familiar and 
human by which to judge the size of objects. 
It winked out, then appeared again, and 
Ricardo realized it was close at hand, some- 
thing partially occluded by the soot-black 
columns of the cryovolcano. 

Something that moved. 

He let the submersible drift. Whatever it 
was, he didn't want to startle it. 

Slowly it came into view. 

An alabaster-white carapace. Crimson- 
tipped thorns cresting sharp-jointed legs. A 
hooked beak framed by feathery fronds that 
sculled and groped at the deep. 

It was a monstrosity pried from the oil- 
cake layers of the Burgess shale and jolted to 
life. The gangly and utterly alien way it moved 
was infused with a crawling strangeness that 
sent chills prickling up Ricardo’ spine, across 
his shoulders and into the base of his brain. 
The words crab and spider and giant squid 
flashed through his mind, but of course it was 
none of these. He settled on xeno-arachnid, 
because, a man of science, he could not bring 
himself to call it what it was: monster. 

Its fronds quivered and reached out. Prob- 
ing. Curious. Angling his way. Suddenly 
Ricardos mind flashed with comprehension: 
the creature had nothing that he recognized 
as eyes — but it, too, was dumbfounded with 
wonder. 
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Its gaping beak seemed to gnaw at the 
darkness. It lifted two thorned legs, not 
threatening, slowly extending them towards 
the spotlights on the front of the submers- 
ible. Ricardo was about to pull back when the 
xeno-arachnid halted. Its fronds undulated 
in the shadows cast by its limbs. The beak 
repeated its gnawing motion, then 
again a third time. Slowly. More 
deliberately. Ricardo gasped and 
his eyes went wide. 

It was trying to tell him something. 

But what? 

Ricardo thought he glimpsed 
a shimmer, an iridescence just at 
the limits of his perception. He fin- 
gered the toggles for the spotlights 
and the interior lights, flicked them 
off one by one, and plunged himself 
into abject darkness. 

But as his eyes adjusted, he real- 
ized the darkness was not absolute, 
the darkness was not eternal. 

Not at all. He had only just begun 
to see the light. 

The xeno-arachnid’s legs glowed with a 
ghostly bioluminescence. Its carapace grew 
brighter and slowly turned to face him. The 
legs — four of them working in unison — 
scrabbled across the surface, wove in sombre 
blues and muted whites a tapestry of over- 
lapping calligraphies that became more and 
more complex with each pass of its limbs. 

The pattern lightning-bolted in Ricardo’s 
mind to something he recognized, to pat- 
terns larger still. The massive pack-ice shards 
of Europa’s frozen crust. The jumbled cunei- 
form of pressure ridges and rifts stamped 
into the icy potsherds fracturing the surface. 

The rafting of the Europan surface was not 
randomat all. 

The creature was writing its story, a small 
fragment of the same immense narrative that 
was etched into the Rosetta-stone shards that 
circumscribed the Europan globe. 

Ricardo held no proof, but he knew. The 
xeno-arachnid was telling him. He knew. 
The light in the darkness was written on its 
carapace. The creature was like him, a kin- 
dred spirit, an exile, and their names were 
scrawled upon the ostracons of Europa. = 
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